{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ecd838ea",
   "metadata": {},
   "source": [
    "## Report 2：Titanic (English Version)\n",
    "\n",
    "### 1. Description and Purpose\n",
    "\n",
    "#### 1.1 Problem Backgrounds and Tasks\n",
    "\n",
    "  The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.\n",
    "\n",
    "  One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.\n",
    "\n",
    "  In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy.\n",
    "\n",
    "#### 1.2 Requirements and Data Description\n",
    "\n",
    "  There are two groups in the given data, respectively are the training set and the test set. The training set should be used to build your machine learning models. It provides the outcome (also known as the ground truth) for each passenger. Your model will be based on features like passengers' gender and class. You can also use feature engineering to create new features.\n",
    "\n",
    "  The test set should be used to see how well your model performs on unseen data. For the test set, it does not provide the ground truth for each passenger. You should use the model you trained to predict whether each passenger in the test set will survived the sinking of the Titanic. And attachment file gender_submission.csv shows how your submission file should look like.\n",
    "\n",
    "* **Data description of the test set are listed below:**"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16a30ce0",
   "metadata": {},
   "source": [
    "![1.1](figure/1.1.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dd39e4d5",
   "metadata": {},
   "source": [
    "**Variable notes:**\n",
    "\n",
    "***pclass:*** A proxy for socio-economic status (SES)\n",
    ">1st = Upper 2nd = Middle 3rd = Lower\n",
    "\n",
    "***age:*** Age is fractional if less than 1. If the age is estimated, is it in the form of xx.5\n",
    "\n",
    "***sibsp:*** The dataset defines family relations in this way...\n",
    ">Sibling = brother, sister, stepbrother, stepsister\n",
    "\n",
    ">Spouse = husband, wife (mistresses and fiancés were ignored)\n",
    "\n",
    "***parch:*** The dataset defines family relations in this way...\n",
    ">Parent = mother, father\n",
    "\n",
    ">Child = daughter, son, stepdaughter, stepson\n",
    "\n",
    ">Some children travelled only with a nanny, therefore parch=0 for them.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "882d05a4",
   "metadata": {},
   "source": [
    "### 2. Background Knowledge\n",
    "\n",
    "#### 2.1 Seaborn: Data Visualizing Tool Box\n",
    "\n",
    "**seaborn.FacetGrid function and important parameters**\n",
    "\n",
    "![1.2](figure/1.2.jpg)\n",
    "\n",
    "**Through this function we can plot several figures across different dataset features.**\n",
    "\n",
    "Some important function parameters used in the program are listed below\n",
    "\n",
    "***data: DataFrame***\n",
    ">The dataset read in by the function\n",
    "\n",
    "***row, col, hue: strings***\n",
    ">Variables that define subsets of the data, which will be drawn on separate facets.\n",
    "\n",
    "***height: scalar***\n",
    ">Height (in inches) of each facet.\n",
    "\n",
    "***aspect: scalar***\n",
    ">Aspect ratio of each facet, so that ‘aspect * height’ gives the width of each facet in inches.\n",
    "\n",
    "***legend_out: bool***\n",
    ">If True, the figure size will be extended, and the legend will be drawn outside the plot on the center right.\n",
    "\n",
    "***{row,col,hue}_order: lists***\n",
    ">Order for the levels of the faceting variables. By default, this will be the order that the levels appear in data or, if the variables are pandas categoricals, the category order.\n",
    "\n",
    "And this function is often used together with matplotlib, which can be seen in the following section 3.3."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5cce2c0d",
   "metadata": {},
   "source": [
    "### 3. Data Analysis and Visualizations \n",
    "\n",
    "#### 3.1 Data Preview\n",
    "\n",
    "In data classification problems, data analysis can be a vital process. So before training the model, we must discuss the characteristics of the data set thoroughly. We can use following commands to acquire and explore the data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "a33a089f",
   "metadata": {},
   "outputs": [],
   "source": [
    "#data analyzing and wrangling\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import random as rnd\n",
    "\n",
    "# visualizing tools （results has been displayed in the notebook）\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1d58acf2",
   "metadata": {},
   "source": [
    "And acquiring data set from the csv files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "b0bc2781",
   "metadata": {},
   "outputs": [],
   "source": [
    "data_train=pd.read_csv(\"data/train.csv\")\n",
    "data_test=pd.read_csv(\"data/test.csv\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "daaed547",
   "metadata": {},
   "source": [
    "Now we can have a brief preview of the data using pandas"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "1048f4d5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0669e175",
   "metadata": {},
   "source": [
    "We can classify these data into several different clusters:\n",
    "\n",
    "* **Categorical:** These values classify the samples into sets of similar samples, like ‘survived’ ‘sex’ ‘embarked’ ‘pclass’ features in the data set.\n",
    "\n",
    "* **Numerical:** Numerical features in the data set, like ‘age’ ‘fare’ ‘sibsp’ ‘parch’\n",
    "\n",
    "* **Mixed data types:** These features are alphanumeric, like ‘ticket’ ‘cabin’\n",
    "\n",
    "* **Data type:** Among all given features, 6 of them are integer or floats, others are object.\n",
    "\n",
    "* **Feature contains empty values:** We can use the following command to see the incomplete features in the data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "222423e9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 891 entries, 0 to 890\n",
      "Data columns (total 12 columns):\n",
      " #   Column       Non-Null Count  Dtype  \n",
      "---  ------       --------------  -----  \n",
      " 0   PassengerId  891 non-null    int64  \n",
      " 1   Survived     891 non-null    int64  \n",
      " 2   Pclass       891 non-null    int64  \n",
      " 3   Name         891 non-null    object \n",
      " 4   Sex          891 non-null    object \n",
      " 5   Age          714 non-null    float64\n",
      " 6   SibSp        891 non-null    int64  \n",
      " 7   Parch        891 non-null    int64  \n",
      " 8   Ticket       891 non-null    object \n",
      " 9   Fare         891 non-null    float64\n",
      " 10  Cabin        204 non-null    object \n",
      " 11  Embarked     889 non-null    object \n",
      "dtypes: float64(2), int64(5), object(5)\n",
      "memory usage: 83.7+ KB\n"
     ]
    }
   ],
   "source": [
    "data_train.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "ab0412dd",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 418 entries, 0 to 417\n",
      "Data columns (total 11 columns):\n",
      " #   Column       Non-Null Count  Dtype  \n",
      "---  ------       --------------  -----  \n",
      " 0   PassengerId  418 non-null    int64  \n",
      " 1   Pclass       418 non-null    int64  \n",
      " 2   Name         418 non-null    object \n",
      " 3   Sex          418 non-null    object \n",
      " 4   Age          332 non-null    float64\n",
      " 5   SibSp        418 non-null    int64  \n",
      " 6   Parch        418 non-null    int64  \n",
      " 7   Ticket       418 non-null    object \n",
      " 8   Fare         417 non-null    float64\n",
      " 9   Cabin        91 non-null     object \n",
      " 10  Embarked     418 non-null    object \n",
      "dtypes: float64(2), int64(4), object(5)\n",
      "memory usage: 36.0+ KB\n"
     ]
    }
   ],
   "source": [
    "data_test.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4ece06e",
   "metadata": {},
   "source": [
    "Also, we can get the distribution of numerical and categorical feature values across the samples, which helps us to make reasonable assumptions to tackle the classification problem."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a060bfa5",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Fare</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>891.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>714.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>891.000000</td>\n",
       "      <td>891.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>446.000000</td>\n",
       "      <td>0.383838</td>\n",
       "      <td>2.308642</td>\n",
       "      <td>29.699118</td>\n",
       "      <td>0.523008</td>\n",
       "      <td>0.381594</td>\n",
       "      <td>32.204208</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>257.353842</td>\n",
       "      <td>0.486592</td>\n",
       "      <td>0.836071</td>\n",
       "      <td>14.526497</td>\n",
       "      <td>1.102743</td>\n",
       "      <td>0.806057</td>\n",
       "      <td>49.693429</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.420000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>223.500000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>20.125000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>7.910400</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>446.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>28.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>14.454200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>668.500000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>38.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>31.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>891.000000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>80.000000</td>\n",
       "      <td>8.000000</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>512.329200</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       PassengerId    Survived      Pclass         Age       SibSp  \\\n",
       "count   891.000000  891.000000  891.000000  714.000000  891.000000   \n",
       "mean    446.000000    0.383838    2.308642   29.699118    0.523008   \n",
       "std     257.353842    0.486592    0.836071   14.526497    1.102743   \n",
       "min       1.000000    0.000000    1.000000    0.420000    0.000000   \n",
       "25%     223.500000    0.000000    2.000000   20.125000    0.000000   \n",
       "50%     446.000000    0.000000    3.000000   28.000000    0.000000   \n",
       "75%     668.500000    1.000000    3.000000   38.000000    1.000000   \n",
       "max     891.000000    1.000000    3.000000   80.000000    8.000000   \n",
       "\n",
       "            Parch        Fare  \n",
       "count  891.000000  891.000000  \n",
       "mean     0.381594   32.204208  \n",
       "std      0.806057   49.693429  \n",
       "min      0.000000    0.000000  \n",
       "25%      0.000000    7.910400  \n",
       "50%      0.000000   14.454200  \n",
       "75%      0.000000   31.000000  \n",
       "max      6.000000  512.329200  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_train.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d960fbf",
   "metadata": {},
   "source": [
    "**We can get from the numerical features that:**\n",
    "\n",
    "* Total samples are 891 or 40% of the actual number of passengers on board the Titanic \n",
    "\n",
    "* Survived is a categorical feature with 0 or 1 values.\n",
    "\n",
    "* Around 38% samples survived.\n",
    "\n",
    "* Most passengers (> 75%) did not travel with parents or children.\n",
    "\n",
    "* Nearly 30% of the passengers had siblings and/or spouse aboard.\n",
    "\n",
    "* Fares varied significantly with few passengers (<1%) paying as high as $512.\n",
    "\n",
    "* Few elderly passengers (<1%) within age range 65-80.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "ebdb70a6",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>891</td>\n",
       "      <td>891</td>\n",
       "      <td>891</td>\n",
       "      <td>204</td>\n",
       "      <td>889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>891</td>\n",
       "      <td>2</td>\n",
       "      <td>681</td>\n",
       "      <td>147</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>Keefe, Mr. Arthur</td>\n",
       "      <td>male</td>\n",
       "      <td>347082</td>\n",
       "      <td>G6</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>1</td>\n",
       "      <td>577</td>\n",
       "      <td>7</td>\n",
       "      <td>4</td>\n",
       "      <td>644</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Name   Sex  Ticket Cabin Embarked\n",
       "count                 891   891     891   204      889\n",
       "unique                891     2     681   147        3\n",
       "top     Keefe, Mr. Arthur  male  347082    G6        S\n",
       "freq                    1   577       7     4      644"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_train.describe(include=['O'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a6cf6af",
   "metadata": {},
   "source": [
    "**We can also get from the categorical features that:**\n",
    "\n",
    "* Names are unique across the dataset (count=unique=891)\n",
    "\n",
    "* Sex variable as two possible values with 65% male (freq=577/count=891).\n",
    "\n",
    "* Cabin values have several duplicates across samples. Alternatively several passengers shared a cabin.\n",
    "\n",
    "* Embarked takes three possible values. S port used by most passengers (top=S)\n",
    "\n",
    "* Ticket feature has high ratio (22%) of duplicate values (unique=681).\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "022c0e48",
   "metadata": {},
   "source": [
    "#### 3.2 Assumptions based on the data analysis\n",
    "\n",
    "After having a brief preview of the data set, we can make some preliminary assumptions, we want to correlate some important features with survival, and complete some crucial survival-related features with empty values, etc. They are listed as follows:\n",
    "\n",
    "* **Correlating and Completing**\n",
    "\n",
    "We want to know how well does each feature correlate with Survival. And we want to complete ‘age’ feature as it is definitely correlated to survival. We also want to complete the ‘embarked’ feature as it may also correlate with survival or another important feature.\n",
    "\n",
    "* **Correcting**\n",
    "\n",
    "1.\t‘Ticket’ feature may be dropped from our analysis as it contains high ratio of duplicates (22%) and there may not be a correlation between Ticket and survival.\n",
    "\n",
    "2.\t‘Cabin’ feature may be dropped as it is highly incomplete or contains many null values both in training and test dataset.\n",
    "\n",
    "3.\t‘PassengerId’ may be dropped from training set as it does not contribute to survival.\n",
    "\n",
    "4.\tName feature maybe dropped as it is relatively non-standard, it may not contribute directly to survival.\n",
    "\n",
    "* **Classifying**\n",
    "\n",
    "We may also add to our assumptions based on the problem description noted earlier.\n",
    "\n",
    "1.\tWomen (Sex=female) were more likely to have survived.\n",
    "\n",
    "2.\tChildren (Age<?) were more likely to have survived.\n",
    "\n",
    "3.\tThe upper-class passengers (Pclass=1) were more likely to have survived.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b2b78e5",
   "metadata": {},
   "source": [
    "#### 3.3 Data Visualization\n",
    "\n",
    "Now we can continue confirming some of our assumptions using visualizations for analyzing the data. Visualization tools can help us to better observe the correlation between features and survival, and we can make some decision on what to do with data set on the basis of the conclusion in this step.\n",
    "\n",
    "##### 3.3.1 Age feature visualization\n",
    "\n",
    "In this step, we mainly use seaborn visualizing tool to illustrate. The main introduction of the function we used can be seen in section2. Analysis of other features are all alike.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "fd67996c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<seaborn.axisgrid.FacetGrid at 0x2a445c93970>"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAADQCAYAAABStPXYAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAQuUlEQVR4nO3dfZBddX3H8fdHQKngA8ElEwEb2zIo0vK0Kki11YgTH2poBQsVJ87gpH9gi62ODfWP6jid4kzH0anFMaPW+FAFUUomdoQ0QKsdBwkKSEQN1RSikSSoKE5HDXz7xz2BHbJhb3bv3fvbve/XzJ1zz7lPnw375Xt/v3P2nFQVkiS15gmjDiBJ0nRsUJKkJtmgJElNskFJkppkg5IkNckGJUlqkg1qniR5Z5KtSe5IcluSFw7ofV+bZO2A3uvBAbzHk5JcmeTuJDcnWT6AaBoTY1QnL0ny9SR7k5w3iFyL0aGjDjAOkpwFvAY4vap+meQZwBMP4vWHVtXe6R6rqg3AhsEkHYiLgZ9U1e8kuQB4L/CnI86kBWDM6uQe4E3A20eco2mOoObHMmBPVf0SoKr2VNUPAZJs7wqRJJNJburuvyvJuiTXA5/oRiPP2/eGSW5KckaSNyX5YJKnde/1hO7xJye5N8lhSX47yZeS3Jrky0me0z3n2Um+muSWJO8Z0M+6Cljf3b8aWJEkA3pvLW5jUydVtb2q7gAeHsT7LVY2qPlxPXB8ku8muSLJH/T5ujOAVVX1Z8BngdcDJFkGPLOqbt33xKp6ALgd2PfefwRcV1W/BtYBf1FVZ9D7xnZF95wPAB+qqucDPzpQiK5Yb5vm9vJpnn4scG+XaS/wAHB0nz+vxts41Yn64BTfPKiqB5OcAbwYeClwZZK1VfXxGV66oar+r7t/FbAJ+Dt6Bfi5aZ5/Jb3ptBuBC4ArkhwJvAj43JSBzJO65dnA67r7n6Q3HTdd/hfPkHOq6UZLnk9LMxqzOlEfbFDzpKoeAm4CbkryTWA18HFgL4+OZA9/zMt+MeX1P0hyf5Lfo1dcfz7Nx2wA/iHJEnrfKm8AjgB+WlWnHijaTNmTfBl4yjQPvb2q/uMx23YAxwM7khwKPA348UyfIcFY1Yn64BTfPEhyYpITpmw6Ffjf7v52ekUCj35LO5DPAu8AnlZV33zsg1X1IPA1elMSG6vqoar6GfD9JOd3WZLklO4l/03vGyTAGw70oVX14qo6dZrbdEW3gd7/VADOA24oz0isPoxZnagPNqj5cSSwPsm3ktwBnAS8q3vs3cAHum9fD83wPlfTK5SrHuc5VwIXdct93gBcnOR2YCu9AxkALgUuSXILvZHOIHwUODrJ3cBfAwM5tFdjYWzqJMnzk+wAzgc+nGTrIN53sYlfbiVJLXIEJUlqkg1KktQkG5QkqUk2KElSk+a1Qa1cubLo/T2BN2/jcJsV68TbGN6mNa8Nas+ePfP5cdKCZJ1IPU7xSZKaZIOSJDXJBiVJapINSpLUJBuUJKlJNihJUpO8HtSALV/7xcd9fPvlr56nJJK0sDmCkiQ1yQYlSWqSDUqS1CQblCSpSR4kMc88iEKS+uMISpLUJBuUJKlJNihJUpNsUJKkJtmgJElNskFJkprU12HmSbYDPwceAvZW1WSSJcCVwHJgO/D6qvrJcGLOHw8Dl6Q2HMwI6qVVdWpVTXbra4HNVXUCsLlblyRpIOYyxbcKWN/dXw+cO+c0kiR1+m1QBVyf5NYka7ptS6tqJ0C3PGa6FyZZk2RLki27d++ee2JpEbJOpP3126DOrqrTgVcClyR5Sb8fUFXrqmqyqiYnJiZmFVJa7KwTaX99Naiq+mG33AVcA7wAuC/JMoBuuWtYISVJ42fGBpXkiCRP2XcfeAVwJ7ABWN09bTVw7bBCSpLGTz+HmS8Frkmy7/n/WlVfSnILcFWSi4F7gPOHF1OSNG5mbFBV9T3glGm23w+sGEaols30d1KSpMHwTBKSpCbZoCRJTbJBSZKaZIOSJDXJBiVJapINSpLUJBuUJKlJNihJUpNsUJKkJtmgJElNskFJkppkg5IkNckGJUlqkg1KktQkG5QkqUk2KElSk/puUEkOSfKNJBu79SVJNiXZ1i2PGl5MSdK4OZgR1KXAXVPW1wKbq+oEYHO3LknSQPTVoJIcB7wa+MiUzauA9d399cC5A00mSRpr/Y6g3g+8A3h4yralVbUToFseM90Lk6xJsiXJlt27d88lq7RoWSfS/mZsUEleA+yqqltn8wFVta6qJqtqcmJiYjZvIS161om0v0P7eM7ZwGuTvAo4HHhqkk8B9yVZVlU7kywDdg0zqCRpvMw4gqqqy6rquKpaDlwA3FBVFwEbgNXd01YD1w4tpSRp7Mzl76AuB85Jsg04p1uXJGkg+pnie0RV3QTc1N2/H1gx+EiSJHkmCUlSo2xQkqQm2aAkSU2yQUmSmnRQB0lI0sFavvaLj/v49stfPU9JtNA4gpIkNckGJUlqklN8kpo30zRhP5xKXHgcQUmSmuQIagFxZ7OkceIISpLUJBuUJKlJNihJUpNsUJKkJtmgJElNskFJkpo0Y4NKcniSryW5PcnWJO/uti9JsinJtm551PDjSpLGRT8jqF8CL6uqU4BTgZVJzgTWApur6gRgc7cuSdJAzNigqufBbvWw7lbAKmB9t309cO4wAkqSxlNf+6CSHJLkNmAXsKmqbgaWVtVOgG55zNBSSpLGTl+nOqqqh4BTkzwduCbJyf1+QJI1wBqAZz3rWbPJOFYGcVJMLTzjXCf+zutADuoovqr6KXATsBK4L8kygG656wCvWVdVk1U1OTExMbe00iJlnUj76+covolu5ESS3wBeDnwb2ACs7p62Grh2SBklSWOonym+ZcD6JIfQa2hXVdXGJF8FrkpyMXAPcP4Qc0qSxsyMDaqq7gBOm2b7/cCKYYSSJMnrQS0iXi9K0mLiqY4kSU1yBCUtQP0cmj0fI2YPEdcwOYKSJDXJBiVJapINSpLUJBuUJKlJNihJUpNsUJKkJtmgJElNskFJkppkg5IkNckzSegRnstPUkscQUmSmmSDkiQ1yQYlSWqSDUqS1KQZG1SS45PcmOSuJFuTXNptX5JkU5Jt3fKo4ceVJI2LfkZQe4G3VdVzgTOBS5KcBKwFNlfVCcDmbl2SpIGYsUFV1c6q+np3/+fAXcCxwCpgffe09cC5Q8ooSRpDB7UPKsly4DTgZmBpVe2EXhMDjjnAa9Yk2ZJky+7du+cYV1qcrBNpf303qCRHAp8H3lpVP+v3dVW1rqomq2pyYmJiNhmlRc86kfbXV4NKchi95vTpqvpCt/m+JMu6x5cBu4YTUZI0jvo5ii/AR4G7qup9Ux7aAKzu7q8Grh18PEnSuOrnXHxnA28Evpnktm7b3wKXA1cluRi4Bzh/KAklSWNpxgZVVV8BcoCHVww2jiRJPZ5JQpLUJBuUJKlJXg9qjMx0vSdpMevn999rnrXFEZQkqUk2KElSk2xQkqQm2aAkSU3yIAn1baadzO5gXnw8sEaj5AhKktQkR1CSNEDONAyOIyhJUpNsUJKkJjU5xecQWZLkCEqS1KQmR1CSNAoeVt8WR1CSpCb1c8n3jyXZleTOKduWJNmUZFu3PGq4MSVJ46afKb6PAx8EPjFl21pgc1VdnmRtt/43g4938DzAQpIWhxlHUFX1X8CPH7N5FbC+u78eOHewsSRJ4262+6CWVtVOgG55zIGemGRNki1JtuzevXuWHyctbtaJtL+hHyRRVeuqarKqJicmJob9cdKCZJ1I+5ttg7ovyTKAbrlrcJEkSZr930FtAFYDl3fLaweWSNJAeMCQFrp+DjP/DPBV4MQkO5JcTK8xnZNkG3BOty5J0sDMOIKqqgsP8NCKAWfRIua3eUkHyzNJSJKaZIOSJDXJk8VqYOZyok2nADUu+qkTf997HEFJkppkg5IkNckpPi0ITgFK48cRlCSpSQtyBDXMnfGSNGrOGPQ4gpIkNckGJUlq0oKc4pMOllMm+3O6e+Eal7+lcgQlSWqSDUqS1CQblCSpSTYoSVKTPEhCi4I7/KXFxxGUJKlJcxpBJVkJfAA4BPhIVXnpd0kaE8M+3H3WI6gkhwD/DLwSOAm4MMlJs04iSdIUc5niewFwd1V9r6p+BXwWWDWYWJKkcZeqmt0Lk/OAlVX15m79jcALq+otj3neGmBNt3oi8J3HedtnAHtmFWj+mXU4FlPWPVW1sp83sk6aYNbh6CfrtLUyl31QmWbbft2uqtYB6/p6w2RLVU3OIdO8MetwjGtW62T0zDocc8k6lym+HcDxU9aPA344h/eTJOkRc2lQtwAnJHl2kicCFwAbBhNLkjTuZj3FV1V7k7wFuI7eYeYfq6qtc8zT1xRHI8w6HGZt93Nnw6zDMRZZZ32QhCRJw+SZJCRJTbJBSZKa1ESDSrIyyXeS3J1k7ajzTJXk+CQ3JrkrydYkl3bblyTZlGRbtzxq1Fn3SXJIkm8k2ditN5k1ydOTXJ3k292/71kNZ/2r7r//nUk+k+TwUWRttVask+EZ5zoZeYNaAKdM2gu8raqeC5wJXNLlWwtsrqoTgM3deisuBe6ast5q1g8AX6qq5wCn0MvcXNYkxwJ/CUxW1cn0Dgq6gHnO2nitWCfDM751UlUjvQFnAddNWb8MuGzUuR4n77XAOfT+0n9Zt20Z8J1RZ+uyHNf9ErwM2Nhtay4r8FTg+3QH6kzZ3mLWY4F7gSX0jnzdCLxivrMupFqxTgaWc6zrZOQjKB79ofbZ0W1rTpLlwGnAzcDSqtoJ0C2PGWG0qd4PvAN4eMq2FrP+FrAb+JdumuUjSY6gwaxV9QPgH4F7gJ3AA1V1PfOfdUHUinUyUGNdJy00qL5OmTRqSY4EPg+8tap+Nuo800nyGmBXVd066ix9OBQ4HfhQVZ0G/IIGpimm082ZrwKeDTwTOCLJRaOIMs22pmrFOhm4sa6TFhpU86dMSnIYvaL7dFV9odt8X5Jl3ePLgF2jyjfF2cBrk2ynd3b5lyX5FG1m3QHsqKqbu/Wr6RVii1lfDny/qnZX1a+BLwAvYv6zNl0r1slQjHWdtNCgmj5lUpIAHwXuqqr3TXloA7C6u7+a3pz7SFXVZVV1XFUtp/fveENVXUSbWX8E3JvkxG7TCuBbNJiV3pTFmUme3P0+rKC3o3q+szZbK9bJcIx9nYx6x1q34+xVwHeB/wHeOeo8j8n2+/SmUe4AbuturwKOpreTdVu3XDLqrI/J/Yc8uvO3yazAqcCW7t/234CjGs76buDbwJ3AJ4EnjSJrq7VinQw149jWiac6kiQ1qYUpPkmS9mODkiQ1yQYlSWqSDUqS1CQblCSpSTaoRSDJHyepJM8ZdRapZdbKwmKDWhwuBL5C748OJR2YtbKA2KAWuO7cZ2cDF9MVXZInJLmiuy7LxiT/nuS87rEzkvxnkluTXLfvFCTSYmetLDw2qIXvXHrXivku8OMkpwN/AiwHfhd4M73LNOw7V9o/AedV1RnAx4C/H0FmaRTOxVpZUA4ddQDN2YX0Lh0AvRNfXggcBnyuqh4GfpTkxu7xE4GTgU29U2VxCL3T4kvjwFpZYGxQC1iSo+ldcO3kJEWviAq45kAvAbZW1VnzFFFqgrWyMDnFt7CdB3yiqn6zqpZX1fH0rr65B3hdN7++lN4JMaF3ZcuJJI9MYyR53iiCS/PMWlmAbFAL24Xs/w3w8/QuFraD3hmFP0zvyqYPVNWv6BXqe5PcTu+M0y+at7TS6FgrC5BnM1+kkhxZVQ92UxtfA86u3rVlJE1hrbTLfVCL18YkTweeCLzHgpMOyFpplCMoSVKT3AclSWqSDUqS1CQblCSpSTYoSVKTbFCSpCb9P81FgQhLzgCrAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x216 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# age feature visualization\n",
    "age=sns.FacetGrid(data_train, col='Survived')\n",
    "age.map(plt.hist, 'Age', bins=20)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "120652f5",
   "metadata": {},
   "source": [
    "* **Observations:**\n",
    "\n",
    "1.\tChildren under 4 years old have a very high surviving rate.\n",
    "\n",
    "2.\tA large number of people in 18-26 did not survive.\n",
    "\n",
    "3.\tMost passengers are in 15-35 age range.\n",
    "\n",
    "4.\tNone of elderly passenger (62-68) survived, but the eldest passenger (80) survived.\n",
    "\n",
    "* **Conclusions:**\n",
    "\n",
    "1.\tAge should be concerned in model training, and we should band age groups as the survival rate varies greatly in different age range.\n",
    "\n",
    "2.\tWe should also complete the null values in age feature.\n",
    "\n",
    "##### 3.3.2 Pclass feature visualization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "d988757f",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAgAAAAHUCAYAAABMP5BeAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAArZElEQVR4nO3df7RddX3n/+fLgBWrVqIXvpHAoDZaI1NiSRHF1UGQr7F2JNMpIhUNHWiqi86g33ZpKNVBW5d86yx/4nSVBU6irTVRUShjoVmR+KNCIAoBASGUIqaNJGA7SrVV8D1/nB29udybe+6959x7ztnPx1pnnb0/Z+993p9zz/vc9/nsffZOVSFJktrlcQsdgCRJmn8WAJIktZAFgCRJLWQBIElSC1kASJLUQhYAkiS1kAWAJEktZAEwB0keTXJLkq8n+WSSJx5g2YuS/P58xjdFHL+Q5Pok/3ageJKsT3LSJO2HJ7k6yY4kdyT5XA9juyzJ8h5s5+wkl/RgO8cluS3JPUk+mCRz3aYGj3k88nn8riTfSvLwXLc1aiwA5uYHVbWiqo4Bfgi8YaED6sJ3gP8G/I9Zrv9OYHNVHVtVy4F1M1k5yaKpHquqc6vqjlnG1Q9/CqwFljW3VQsbjvrEPB7tPP4r4PiFDmIQWQD0zpeAnwdI8voktzbV9ccmLpjkt5Pc1Dz+6X3fOJKc3nwL2ZHki03b85Pc2HxDuTXJsrkEWVV7quom4Eez3MQSYNe47d3axHlSkqv3tSe5JMnZzfR9Sd6e5MvAW5LcOG65o5Ps28bWJCuTvDHJn4xb5uwkH2qmzxr3evzZvg+iJL+V5O4kXwBOnGXffiLJEuApVXV9dU6X+VFg9Vy3q4FnHo9QHjd9u6GqdvdiW6PGAqAHkhwEvAK4LcnzgQuBk6vqWOD8SVa5oqp+uXn8TuCcpv3twMub9lc1bW8APlBVK4CVjEvacc+/sUmkibfX97KfjQ8Dlye5LsmFSZ7R5Xr/WlUvqap3A49P8qym/Qxg04RlPwX8+rj5M4CNSZ7XTJ/YvB6PAq9t/lm/g84HxqnApMOPSV46xev0lUkWP4L9X+tdTZtGlHnclWHLYx3AQQsdwJA7JMktzfSXgMuB3wE+VVUPAlTVdyZZ75gkfww8FXgScG3T/rfA+iSbgCuatuuBC5MspfOBs3PixqrqjN50Z3pVdW2T9KvofFjenOSYLlbdOG56E/Bq4GI6HwT7xV9Ve5Pcm+QEYCfwXDqvzXnAccBN6eyOPwTYA7wQ2FpVe6HzQQo8Z5LYrwNWdNnVyfb3e+GM0WQej24e6wAsAObmB00F+xPpvKOn+0exHlhdVTua4bWTAKrqDUleCLwSuCXJiqr6eJJtTdu1Sc6tqs9PeM6NdJJrovdW1Udn3q0Daz4MPw58vBku/BXgAfYfUXrChNX+Zdz0RuCTSa7obO6xH4bNMq8GvgF8pqqqeW03VNUF4xdMspou/jkneSnwvkke+n5VvXhC2y5g6bj5pcA/TvccGkrm8ejmsQ7AAqD3tgCfSfK+qnooyeJJvj08Gdid5GDgtcA/ACR5dlVtA7Yl+Y/AkUl+Dri3qj7YVOy/COz3wTGf3xySnAzcUFXfT/Jk4NnA/cC3geVJfobOh8YpwJcn20ZV/V2SR4G3sf83ivGuoDME+03grU3bFuDK5rXdk2QxnddyG/CBJE8DvgucDuyY5Hm7/uZQVbuTfK/59rINeD3woW7W1Ugwj0cgj3VgFgA9VlW3J3kX8IUmOW4Gzp6w2NvovNm/CdxG580P8J50Dg4KnSTZQefo3LOS/IhOcr5zLvEl+X+A7cBTgB8neROwvKq+2+UmjgMuSfIInW8KlzUHI9EMed5KZ7jv5mm2sxF4D/DMyR6sqn9KckcT241N2x1J/hD4mySPo3MA1HlVdUOSi+gMs+4GvgZMeZTyDLyRzre8Q4C/bm5qAfN4dPI4nQMRfxN4YpJddPp60Vy3OwrSOcBZ2l+S9cD6qtq6wKFImiXzWAfirwAkSWohCwBN5bPAfQscg6S5+SzmsabgLgBJklrIEQBJklpoXn8FsGrVqrrmmmvm8yklTW3WFzcyl6WBMqtcntcRgAcffHA+n05Sn5jL0vBzF4AkSS1kASBJUgtZAEiS1EIWAJIktZAFgCRJLWQBIElSC1kASJLUQl0XAEkWJbk5ydXN/OIkm5PsbO4P7V+YkiSpl2YyAnA+cOe4+XXAlqpaRuea1+t6GZgkSeqfrgqAJEuBVwKXjWs+DdjQTG8AVvc0MkmS1DfdjgC8H3gL8ONxbYdX1W6A5v6w3oYmSZL6ZdoCIMmvAXuq6quzeYIka5NsT7J97969s9mEpAFgLkujpZsRgBOBVyW5D/gEcHKSPwceSLIEoLnfM9nKVXVpVa2sqpVjY2M9ClvSfDOXpdEybQFQVRdU1dKqOhp4DfD5qjoLuApY0yy2Briyb1FKkqSemst5AC4GTk2yEzi1mZckSUPgoJksXFVbga3N9EPAKb0PSZIk9ZtnApQkqYUsACRJaiELAEmSWsgCQJKkFrIAkCSphSwAJElqIQsASZJayAJAkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBayAJAkqYUsACRJaiELAEmSWsgCQJKkFrIAkCSphaYtAJI8IcmNSXYkuT3JO5r2xUk2J9nZ3B/a/3AlSVIvdDMC8G/AyVV1LLACWJXkBGAdsKWqlgFbmnlJkjQEpi0AquPhZvbg5lbAacCGpn0DsLofAUqSpN7r6hiAJIuS3ALsATZX1Tbg8KraDdDcH9a3KCVJUk91VQBU1aNVtQJYChyf5JhunyDJ2iTbk2zfu3fvLMOUtNDMZWm0zOhXAFX1z8BWYBXwQJIlAM39ninWubSqVlbVyrGxsblFK2nBmMvSaOnmVwBjSZ7aTB8CvAz4BnAVsKZZbA1wZZ9ilCRJPXZQF8ssATYkWUSnYNhUVVcnuR7YlOQc4H7g9D7GKUmSemjaAqCqbgVeMEn7Q8Ap/QhKkiT1l2cClCSphSwAJElqIQsASZJayAJAkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBayAJAkqYUsACRJaiELAEmSWsgCQJKkFrIAkCSphSwAJElqIQsASZJayAJAkqQWsgCQJKmFpi0AkhyZ5Lokdya5Pcn5TfviJJuT7GzuD+1/uJIkqRe6GQF4BPi9qnoecAJwXpLlwDpgS1UtA7Y085IkaQhMWwBU1e6q+loz/T3gTuAI4DRgQ7PYBmB1n2KUJEk9NqNjAJIcDbwA2AYcXlW7oVMkAIf1PDpJktQXB3W7YJInAZ8G3lRV303S7XprgbUARx111GxilDQAzOWF977Nd3e13JtPfU6fI9Eo6GoEIMnBdP75/0VVXdE0P5BkSfP4EmDPZOtW1aVVtbKqVo6NjfUiZkkLwFyWRks3vwIIcDlwZ1W9d9xDVwFrmuk1wJW9D0+SJPVDN7sATgReB9yW5Jam7Q+Ai4FNSc4B7gdO70uEkiSp56YtAKrqy8BUO/xP6W04kiRpPngmQEmSWsgCQJKkFrIAkCSphSwAJElqIQsASZJayAJAkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBbq+nLAkiQvyavR4QiAJEktZAEgSVILWQBIktRCFgCSJLWQBwFK0ojxQEV1Y9oRgCQfSbInydfHtS1OsjnJzub+0P6GKUmSeqmbXQDrgVUT2tYBW6pqGbClmZckSUNi2l0AVfXFJEdPaD4NOKmZ3gBsBd7ay8AkqQ26Ha4fFu5+GB6zPQjw8KraDdDcH9a7kCRJUr/1/VcASdYm2Z5k+969e/v9dJL6xFyWRstsC4AHkiwBaO73TLVgVV1aVSurauXY2Ngsn07SQjOXpdEy2wLgKmBNM70GuLI34UiSpPkw7UGASf6SzgF/T0+yC/jvwMXApiTnAPcDp/czSEnqp34ciDdqB/dp9HTzK4Azp3jolB7HIkmS5olnAtSC6ObbkT8TkgaHIxqjx2sBSJLUQhYAkiS1kLsA1HMOFUrDwVxtN0cAJElqIUcAJDwoUVL7OAIgSVILWQBIktRC7gIYcPM5ND2Mw+DDGLPmjwe5Da5+/G3M9ZlxBECSpBayAJAkqYXcBTACHAaXJM2UIwCSJLWQIwCz0Ktv3MN4gJIxz/25HI2RNAgcAZAkqYUsACRJaiF3AfTJoA2VD1o83Ri0mActnmExk9fN3SOai27fa77POuY0ApBkVZK7ktyTZF2vgpIkSf016xGAJIuADwOnAruAm5JcVVV39Cq4fTywStJ4jsZoWA3SKMVcRgCOB+6pqnur6ofAJ4DTehOWJEnqp7kUAEcA3xo3v6tpkyRJAy5VNbsVk9OBl1fVuc3864Djq+q/TlhuLbC2mX0ucNc0m3468OCsghpM9mfwjVqfuu3Pg1W1qtuNmsv2Z8C1uT8zyuV95lIAvAi4qKpe3sxfAFBV757VBn+63e1VtXIu2xgk9mfwjVqfBqU/gxJHr9ifwWZ/Zm4uuwBuApYleWaSxwOvAa7qTViSJKmfZv0rgKp6JMnvAtcCi4CPVNXtPYtMkiT1zZxOBFRVnwM+16NY9rm0x9tbaPZn8I1anwalP4MSR6/Yn8Fmf2Zo1scASJKk4eW1ACRJaiELAEmSWsgCQJKkFrIAkCSphSwAJElqIQsASZJayAJAkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBayAJiDJI8muSXJ15N8MskTD7DsRUl+fz7jmyKO1ya5tbl9JcmxUyy3PslJk7QfnuTqJDuS3JGkZxeDSnJZkuU92M7ZSS7pwXaOS3JbknuSfDBJ5rpNDR7zeOTz+F1JvpXk4blua9RYAMzND6pqRVUdA/wQeMNCB9SFvwf+Q1X9IvBHzPyKU+8ENlfVsVW1HFg3k5WTLJrqsao6t6rumGE8/fSnwFpgWXNbtbDhqE/M49HO478Cjl/oIAaRBUDvfAn4eYAkr28q8x1JPjZxwSS/neSm5vFP7/vGkeT05lvIjiRfbNqen+TG5hvKrUmWzSXIqvpKVf1TM3sDsHSGm1gC7Bq3vVubOE9KcvW4Pl6S5Oxm+r4kb0/yZeAtSW4ct9zRSfZtY2uSlUnemORPxi1zdpIPNdNnjXs9/mzfB1GS30pyd5IvACfOsE+PkWQJ8JSqur46l8z8KLB6rtvVwDOPRyiPm77dUFW7e7GtUWMB0ANJDgJeAdyW5PnAhcDJVXUscP4kq1xRVb/cPH4ncE7T/nbg5U37q5q2NwAfqKoVwErGJe2459/YJNLE2+unCf0c4K9n1ls+DFye5LokFyZ5Rpfr/WtVvaSq3g08PsmzmvYzgE0Tlv0U8Ovj5s8ANiZ5XjN9YvN6PAq8tvln/Q46HxinApMOPyZ56RSv01cmWfwI9n+tdzVtGlHmcVeGLY91AActdABD7pAktzTTXwIuB34H+FRVPQhQVd+ZZL1jkvwx8FTgScC1TfvfAuuTbAKuaNquBy5MspTOB87OiRurqjNmGniSl9L54HjJTNarqmubpF9F58Py5iTHdLHqxnHTm4BXAxfT+SDYL/6q2pvk3iQnADuB59J5bc4DjgNuSmd3/CHAHuCFwNaq2tv0bSPwnElivw5Y0WVXJ9vfX12uq+FiHo9uHusALADm5gdNBfsT6byjp/tHsR5YXVU7muG1kwCq6g1JXgi8ErglyYqq+niSbU3btUnOrarPT3jOjXSSa6L3VtVHJzYm+UXgMuAVVfXQ9N3cX/Nh+HHg481w4a8AD7D/iNITJqz2L+OmNwKfTHJFZ3OP/TBslnk18A3gM1VVzWu7oaoumNCf1XTxz7n5sHzfJA99v6pePKFtF/sPqy4F/nG659BQMo9HN491IFXlbZY34OFJ2p4P3A08rZlf3NxfBPx+M/0gcBhwMLAZWN+0P3vcdm6mU+U+C0jT9n7gTXOM+SjgHuDF0yy3HjhpkvaTgSc200+mM/T5y8CRwH3AzwA/R+cgpbOb5e4Dnj5hOzcBHwPeMq5tK7CymT4UuBe4Dji+aVtO55vEYfteW+Df0dmf+U3gac1r+iXgkh78fW8CTqAzGvDXwK8u9HvOW+9v5vFo5/GB/s5tvzkC0GNVdXuSdwFfSPIonQ+Asycs9jZgG503+210EhDgPekcHBRgC7CDztG5ZyX5EfBtOkfvzsXb6STY/2yG3x6pqpUzWP844JIkj9D5pnBZVd0E0Ax53konuW+eZjsbgfcAz5zswar6pyR3AMur6sam7Y4kfwj8TZLHAT8CzquqG5JcRGeYdTfwNWDKo5Rn4I10PkAPoVMAzHQ/q4aUeTw6eZzOgYi/CTwxyS46fb1ortsdBfsqUmk/SdbT+UazdYFDkTRL5rEOxF8BSJLUQhYAmspn6ezzkzS8Pot5rCm4C0CSpBZyBECSpBayAJAkqYXm9WeAq1atqmuuuWY+n1LS1GZ9dUNzWRoos8rleR0BePDBB+fz6ST1ibksDT93AUiS1EIWAJIktZAFgCRJLWQBIElSC1kASJLUQhYAkiS1kAWAJEkt1HUBkGRRkpuTXN3ML06yOcnO5v7Q/oUpSZJ6aSYjAOcDd46bXwdsqaplwJZmXpIkDYGuCoAkS4FXApeNaz4N2NBMbwBW9zQySZLUN92OALwfeAvw43Fth1fVboDm/rDehiZJkvpl2gIgya8Be6rqq7N5giRrk2xPsn3v3r2z2YSkAWAuS6OlmxGAE4FXJbkP+ARwcpI/Bx5IsgSgud8z2cpVdWlVrayqlWNjYz0KW9J8M5el0TJtAVBVF1TV0qo6GngN8PmqOgu4CljTLLYGuLJvUUqSpJ6ay3kALgZOTbITOLWZlyRJQ+CgmSxcVVuBrc30Q8ApvQ9JkiT1m2cClCSphSwAJElqIQsASZJayAJAkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBayAJAkqYUsACRJaiELAEmSWsgCQJKkFrIAkCSphSwAJElqIQsASZJayAJAkqQWmrYASPKEJDcm2ZHk9iTvaNoXJ9mcZGdzf2j/w5UkSb3QzQjAvwEnV9WxwApgVZITgHXAlqpaBmxp5iVJ0hCYtgCojoeb2YObWwGnARua9g3A6n4EKEmSeq+rYwCSLEpyC7AH2FxV24DDq2o3QHN/WN+ilCRJPdVVAVBVj1bVCmApcHySY7p9giRrk2xPsn3v3r2zDFPSQjOXpdEyo18BVNU/A1uBVcADSZYANPd7pljn0qpaWVUrx8bG5hatpAVjLkujpZtfAYwleWozfQjwMuAbwFXAmmaxNcCVfYpRkiT12EFdLLME2JBkEZ2CYVNVXZ3kemBTknOA+4HT+xinJEnqoWkLgKq6FXjBJO0PAaf0IyhJktRfnglQkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBayAJAkqYUsACRJaiELAEmSWsgCQJKkFrIAkCSphSwAJElqIQsASZJayAJAkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBaatgBIcmSS65LcmeT2JOc37YuTbE6ys7k/tP/hSpKkXuhmBOAR4Peq6nnACcB5SZYD64AtVbUM2NLMS5KkITBtAVBVu6vqa83094A7gSOA04ANzWIbgNV9ilGSJPXYjI4BSHI08AJgG3B4Ve2GTpEAHNbz6CRJUl90XQAkeRLwaeBNVfXdGay3Nsn2JNv37t07mxglDQBzWRotXRUASQ6m88//L6rqiqb5gSRLmseXAHsmW7eqLq2qlVW1cmxsrBcxS1oA5rI0Wrr5FUCAy4E7q+q94x66CljTTK8Brux9eJIkqR8O6mKZE4HXAbcluaVp+wPgYmBTknOA+4HT+xKhJEnquWkLgKr6MpApHj6lt+FIkqT54JkAJUlqIQsASZJayAJAkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBayAJAkqYUsACRJaiELAEmSWsgCQJKkFrIAkCSphSwAJElqIQsASZJayAJAkqQWsgCQJKmFLAAkSWqhaQuAJB9JsifJ18e1LU6yOcnO5v7Q/oYpSZJ66aAullkPXAJ8dFzbOmBLVV2cZF0z/9beh6d93rf57ikfe/Opz5nHSCRJo2DaEYCq+iLwnQnNpwEbmukNwOrehiVJkvpptscAHF5VuwGa+8N6F5IkSeq3bnYBzEmStcBagKOOOqrfTzfw+jGU7+4BzYfZ5vKB3p/ge/RAfO3UT7MdAXggyRKA5n7PVAtW1aVVtbKqVo6Njc3y6SQtNHNZGi2zHQG4ClgDXNzcX9mziCRpSPgNXcOsm58B/iVwPfDcJLuSnEPnH/+pSXYCpzbzkiRpSEw7AlBVZ07x0Ck9jkWSJM2Tvh8EqMHlwYPSwppuF4LUT54KWJKkFnIEYID049uA3zCkA+vnSJj5p0HmCIAkSS1kASBJUgu5C2CWHNqTJA0zRwAkSWohCwBJklrIAkCSpBayAJAkqYU8CFCT8iyBmi9eUGf25vra+dq3myMAkiS1kAWAJEkt5C4A9dRsz48w26HG+X4+zb+FHKYe9vN9LHT87mIYbI4ASJLUQo4AaMa8aJGGie+t2ev3N3hHCBbWnEYAkqxKcleSe5Ks61VQkiSpv2ZdACRZBHwYeAWwHDgzyfJeBSZJkvpnLrsAjgfuqap7AZJ8AjgNuKMXgUn9NJehRw88lDoWeveKuxDmZi67AI4AvjVuflfTJkmSBlyqanYrJqcDL6+qc5v51wHHV9V/nbDcWmBtM/tc4K5pNv104MFZBTWY7M/gG7U+ddufB6tqVbcbNZftz4Brc39mlMv7zKUAeBFwUVW9vJm/AKCq3j2rDf50u9urauVctjFI7M/gG7U+DUp/BiWOXrE/g83+zNxcdgHcBCxL8swkjwdeA1zVm7AkSVI/zfogwKp6JMnvAtcCi4CPVNXtPYtMkiT1zZxOBFRVnwM+16NY9rm0x9tbaPZn8I1anwalP4MSR6/Yn8Fmf2Zo1scASJKk4eW1ACRJaiELAEmSWsgCQJKkFrIAkCSphSwAJElqIQsASZJayAJAkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBayAJAkqYUsAOYgyaNJbkny9SSfTPLEAyx7UZLfn8/4pojjtCS3NnFvT/KSKZZbn+SkSdoPT3J1kh1J7kjSs6tBJrksyfIebOfsJJf0YDvHJbktyT1JPpgkc92mBo95PPJ5/K4k30ry8Fy3NWosAObmB1W1oqqOAX4IvGGhA+rCFuDYqloB/Bfgshmu/05gc1UdW1XLgXUzWTnJoqkeq6pzq+qOGcbTT38KrAWWNbdVCxuO+sQ8Hu08/ivg+IUOYhBZAPTOl4CfB0jy+qY635HkYxMXTPLbSW5qHv/0vm8cSU5vvoXsSPLFpu35SW5sKv1bkyybS5BV9XD99BrQPwvM9HrQS4Bd47Z3axPnSUmuHtfHS5Kc3Uzfl+TtSb4MvCXJjeOWOzrJvm1sTbIyyRuT/Mm4Zc5O8qFm+qxxr8ef7fsgSvJbSe5O8gXgxBn26TGSLAGeUlXXN6/XR4HVc92uBp55PEJ53PTthqra3YttjRoLgB5IchDwCuC2JM8HLgROrqpjgfMnWeWKqvrl5vE7gXOa9rcDL2/aX9W0vQH4QFPpr2Rc0o57/o1NIk28vX6KeP9Tkm8A/5vOt4eZ+DBweZLrklyY5BldrvevVfWSqno38Pgkz2razwA2TVj2U8Cvj5s/A9iY5HnN9InN6/Eo8Nrmn/U76HxgnApMOvyY5KVTvE5fmWTxI9j/td7VtGlEmcddGbY81gEctNABDLlDktzSTH8JuBz4HeBTVfUgQFV9Z5L1jknyx8BTgScB1zbtfwusT7IJuKJpux64MMlSOh84OydurKrOmEnQVfUZ4DNJfgX4I+BlM1j32ibpV9H5sLw5yTFdrLpx3PQm4NXAxXQ+CPaLv6r2Jrk3yQnATuC5dF6b84DjgJvS2R1/CLAHeCGwtar2QueDFHjOJLFfB6zosquT7e+f6bcsDQfzeHTzWAdgATA3P2gq2J9I5x093T+K9cDqqtrRDK+dBFBVb0jyQuCVwC1JVlTVx5Nsa9quTXJuVX1+wnNupJNcE723qj46VRBV9cUkz07y9H0fdN1oPgw/Dny8GS78FeAB9h9ResKE1f5l3PRG4JNJruhs7rEfhs0yrwa+AXymqqp5bTdU1QXjF0yymi7+OSd5KfC+SR76flW9eELbLmDpuPmlwD9O9xwaSubx6OaxDsACoPe20KnK31dVDyVZPMm3hycDu5McDLwW+AeAJM+uqm3AtiT/ETgyyc8B91bVB5uK/ReB/T44ZvLNIcnPA3/XJOIvAY8HHprB+icDN1TV95M8GXg2cD/wbWB5kp+h86FxCvDlybZRVX+X5FHgbez/jWK8K+gMwX4TeGvTtgW4snlt9yRZTOe13AZ8IMnTgO8CpwM7Jnnerr85VNXuJN9rvr1sA14PfKibdTUSzOMRyGMdmAVAj1XV7UneBXyhSY6bgbMnLPY2Om/2bwK30XnzA7wnnYODQidJdtA5OvesJD+ik5zvnGOI/xl4fbO9HwBnjDuYqBvHAZckeYTON4XLquomgGbI81Y6w303T7OdjcB7gGdO9mBV/VOSO4DlVXVj03ZHkj8E/ibJ44AfAedV1Q1JLqIzzLob+Bow5VHKM/BGOt/yDgH+urmpBczj0cnjdA5E/E3giUl20enrRXPd7ijIzN4zaosk64H1VbV1gUORNEvmsQ7EXwFIktRCFgCaymeB+xY4Bklz81nMY03BXQCSJLWQIwCSJLXQvP4KYNWqVXXNNdfM51NKmtqsL25kLksDZVa5PK8jAA8+2PU5KiQNMHNZGn7uApAkqYUsACRJaiELAEmSWsgCQJKkFvJaAH3yvs13d7Xcm099zNUuJUnqO0cAJElqIQsASZJaqOsCIMmiJDcnubqZX5xkc5Kdzf2h/QtTkiT10kxGAM4H7hw3vw7YUlXL6Fzzel0vA5MkSf3TVQGQZCnwSuCycc2nARua6Q3A6p5GJkmS+qbbEYD3A28Bfjyu7fCq2g3Q3B/W29AkSVK/TFsAJPk1YE9VfXU2T5BkbZLtSbbv3bt3NpuQNADMZWm0dDMCcCLwqiT3AZ8ATk7y58ADSZYANPd7Jlu5qi6tqpVVtXJsbKxHYUuab+ayNFqmLQCq6oKqWlpVRwOvAT5fVWcBVwFrmsXWAFf2LUpJktRTczkT4MXApiTnAPcDp/cmpMHX7Vn+JEkaVDMqAKpqK7C1mX4IOKX3IUmSpH7zTICSJLWQBYAkSS1kASBJUgtZAEiS1EIWAJIktZAFgCRJLTSX8wCoB7o5p8CbT33OPEQiSWoTRwAkSWohCwBJklrIAkCSpBayAJAkqYUsACRJaiELAEmSWsgCQJKkFrIAkCSphSwAJElqIc8EOAS6OVsgeMZASVL3ph0BSPKEJDcm2ZHk9iTvaNoXJ9mcZGdzf2j/w5UkSb3QzS6AfwNOrqpjgRXAqiQnAOuALVW1DNjSzEuSpCEwbQFQHQ83swc3twJOAzY07RuA1f0IUJIk9V5XBwEmWZTkFmAPsLmqtgGHV9VugOb+sL5FKUmSeqqrAqCqHq2qFcBS4Pgkx3T7BEnWJtmeZPvevXtnGaakhWYuS6NlRj8DrKp/BrYCq4AHkiwBaO73TLHOpVW1sqpWjo2NzS1aSQvGXJZGSze/AhhL8tRm+hDgZcA3gKuANc1ia4Ar+xSjJEnqsW7OA7AE2JBkEZ2CYVNVXZ3kemBTknOA+4HT+xinJEnqoWkLgKq6FXjBJO0PAaf0IyhJktRfngpYkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBayAJAkqYUsACRJaiELAEmSWsgCQJKkFrIAkCSphSwAJElqIQsASZJayAJAkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBaatgBIcmSS65LcmeT2JOc37YuTbE6ys7k/tP/hSpKkXuhmBOAR4Peq6nnACcB5SZYD64AtVbUM2NLMS5KkITBtAVBVu6vqa83094A7gSOA04ANzWIbgNV9ilGSJPXYjI4BSHI08AJgG3B4Ve2GTpEAHNbz6CRJUl8c1O2CSZ4EfBp4U1V9N0m3660F1gIcddRRs4lRXXrf5runXebNpz5nHiLRKDKXpdHS1QhAkoPp/PP/i6q6oml+IMmS5vElwJ7J1q2qS6tqZVWtHBsb60XMkhaAuSyNlmlHANL5qn85cGdVvXfcQ1cBa4CLm/sr+xKhFoSjCZI02rrZBXAi8DrgtiS3NG1/QOcf/6Yk5wD3A6f3JUJJktRz0xYAVfVlYKod/qf0NhxJkjQfPBOgJEktZAEgSVILdf0zQI2Gbg7ukySNPkcAJElqIQsASZJayAJAkqQWsgCQJKmFLAAkSWohCwBJklrIAkCSpBayAJAkqYUsACRJaiELAEmSWsgCQJKkFrIAkCSphQb6YkDdXLjmzac+Zx4ikSRptEw7ApDkI0n2JPn6uLbFSTYn2dncH9rfMCVJUi91swtgPbBqQts6YEtVLQO2NPOSJGlITLsLoKq+mOToCc2nASc10xuArcBbexlYr7k7QZKkn5rtQYCHV9VugOb+sN6FJEmS+q3vBwEmWQusBTjqqKP6/XSaR92MqnTL0ZfB19Zcnu59Pt17d67rS/0y2xGAB5IsAWju90y1YFVdWlUrq2rl2NjYLJ9O0kIzl6XRMtsC4CpgTTO9BriyN+FIkqT5MO0ugCR/SeeAv6cn2QX8d+BiYFOSc4D7gdP7GaSk4eTw9/R8jbRQuvkVwJlTPHRKj2ORJEnzZKDPBChptM3Ht99eHqwqjRKvBSBJUgtZAEiS1ELuApCkOXAXg4aVIwCSJLWQIwCSNMC8jon6xREASZJayAJAkqQWGvpdAL08AMeDeRZOt6+9Q52S1BuOAEiS1EIWAJIktZAFgCRJLWQBIElSC1kASJLUQhYAkiS1kAWAJEktNPTnAZCGiad1nZnpXi9fq465nsPE17Gd5jQCkGRVkruS3JNkXa+CkiRJ/TXrEYAki4APA6cCu4CbklxVVXf0KjhpokE9W6PfoBbGoL4fhs0ojCA4WjRzcxkBOB64p6ruraofAp8ATutNWJIkqZ/mUgAcAXxr3Pyupk2SJA24uRwEmEna6jELJWuBtc3sw0numma7TwcenENcg8b+DL459+n/61EgPdpWt/25pqpWdbtRc9n+TKWX7/85OGB/BiTGmZjJ32dGubxPqh7zP7u7FZMXARdV1cub+QsAqurds9rgT7e7vapWzmUbg8T+DL5R69Og9GdQ4ugV+zPY7M/MzWUXwE3AsiTPTPJ44DXAVb0JS5Ik9dOsdwFU1SNJfhe4FlgEfKSqbu9ZZJIkqW/mdCKgqvoc8LkexbLPpT3e3kKzP4Nv1Po0KP0ZlDh6xf4MNvszQ7M+BkCSJA0vrwUgSVILDVQBMOynFk5yZJLrktyZ5PYk5zfti5NsTrKzuT90oWOdiSSLktyc5Opmfmj7k+SpST6V5BvN3+lFQ96fNzfvta8n+cskT1jo/pjHg2mU8hjM5V4YmAJg3KmFXwEsB85Msnxho5qxR4Dfq6rnAScA5zV9WAdsqaplwJZmfpicD9w5bn6Y+/MBOr+Z/QXgWDr9Gsr+JDkC+G/Ayqo6hs7BuK9hAftjHg+0UcpjMJfnrqoG4ga8CLh23PwFwAULHdcc+3QlnWsl3AUsadqWAHctdGwz6MPS5o13MnB10zaU/QGeAvw9zbEv49qHtT/7zsa5mM4BvVcD/+9C9sc8HszbKOVxE6+53IPbwIwAMGKnFk5yNPACYBtweFXtBmjuD1vA0Gbq/cBbgB+PaxvW/jwL2Av8r2Yo9LIkP8uQ9qeq/gH4H8D9wG7g/1TV37Cw/TGPB9P7GZ08BnO5JwapAOjq1MLDIMmTgE8Db6qq7y50PLOV5NeAPVX11YWOpUcOAn4J+NOqegHwLwzJEOFkmv2BpwHPBJ4B/GySsxY2KvN40IxgHoO53BODVADsAo4cN78U+McFimXWkhxM50PjL6rqiqb5gSRLmseXAHsWKr4ZOhF4VZL76Fzt8eQkf87w9mcXsKuqtjXzn6LzITKs/XkZ8PdVtbeqfgRcAbyYhe2PeTx4Ri2PwVzuiUEqAIb+1MJJAlwO3FlV7x330FXAmmZ6DZ19igOvqi6oqqVVdTSdv8fnq+oshrc/3wa+leS5TdMpwB0MaX/oDBeekOSJzXvvFDoHQi1kf8zjATNqeQzmcs8s9MEPEw6E+FXgbuDvgAsXOp5ZxP8SOsOdtwK3NLdfBZ5G5wCcnc394oWOdRZ9O4mfHjw0tP0BVgDbm7/RZ4FDh7w/7wC+AXwd+BjwMwvdH/N4cG+jksdN/ObyHG+eCVCSpBYapF0AkiRpnlgASJLUQhYAkiS1kAWAJEktZAEgSVILWQDoMZL8pySV5BcWOhZJs2cu60AsADSZM4Ev0zlpiKThZS5rShYA2k9z/vMTgXNoPjSSPC7J/2yuVX11ks8l+Y3mseOSfCHJV5Ncu++0lZIWlrms6VgAaKLVdK6xfTfwnSS/BPw6cDTw74Fz6Vzydd/50j8E/EZVHQd8BHjXAsQs6bFWYy7rAA5a6AA0cM6kc+lQ6Fw45EzgYOCTVfVj4NtJrmsefy5wDLC5c/pqFtG5lKWkhWcu64AsAPQTSZ4GnAwck6TofAgU8JmpVgFur6oXzVOIkrpgLqsb7gLQeL8BfLSq/l1VHV1VRwJ/DzwI/Odm/+HhdC4oAnAXMJbkJ8OISZ6/EIFL2o+5rGlZAGi8M3nsN4RPA8+gc/3trwN/BmwD/k9V/ZDOB83/n2QHnaumvXjeopU0FXNZ0/JqgOpKkidV1cPN0OKNwInVuSa3pCFiLmsfjwFQt65O8lTg8cAf+YEhDS1zWYAjAJIktZLHAEiS1EIWAJIktZAFgCRJLWQBIElSC1kASJLUQhYAkiS10P8F0dAx82u2TNQAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 514.88x475.2 with 6 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# pclass feature visualization\n",
    "pclass = sns.FacetGrid(data_train, col='Survived', row='Pclass', height=2.2, aspect=1.6)\n",
    "pclass.map(plt.hist, 'Age', alpha=.5, bins=20)\n",
    "pclass.add_legend();# adding explanation for the graph"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "40e8353a",
   "metadata": {},
   "source": [
    "* **Observations:**\n",
    "\n",
    "1.\tClass 3 has the most passengers, but most of them did not survive.\n",
    "\n",
    "2.\tMost infant (age<4) in class 2&3 survived.\n",
    "\n",
    "3.\tMost passengers in Class 1 survived.\n",
    "\n",
    "* **Conclusions:**\n",
    "\n",
    "Conclusion in this part is consistent with the previous conclusion.\n",
    "\n",
    "##### 3.3.3 Embarked correlating features visualization\n",
    "\n",
    "We correlate pclass, sex with embarked feature all together.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "73642dfd",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\86187\\anaconda3\\lib\\site-packages\\seaborn\\axisgrid.py:316: UserWarning: The `size` parameter has been renamed to `height`; please update your code.\n",
      "  warnings.warn(msg, UserWarning)\n",
      "C:\\Users\\86187\\anaconda3\\lib\\site-packages\\seaborn\\axisgrid.py:643: UserWarning: Using the pointplot function without specifying `order` is likely to produce an incorrect plot.\n",
      "  warnings.warn(warning)\n",
      "C:\\Users\\86187\\anaconda3\\lib\\site-packages\\seaborn\\axisgrid.py:648: UserWarning: Using the pointplot function without specifying `hue_order` is likely to produce an incorrect plot.\n",
      "  warnings.warn(warning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<seaborn.axisgrid.FacetGrid at 0x2a446886970>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAnIAAACXCAYAAACROLaNAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuNCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8QVMy6AAAACXBIWXMAAAsTAAALEwEAmpwYAAA0OElEQVR4nO3deXxU9bn48c8zM9lXQhIIS1hkU3ZlEesG4oa2erXWVq/Wbqi32t77s71WW1tbtWr3aquV1rrVpVq0tVcEFXBlR5awCigQICwhZJ9kMjPP74+ZhCRkI8nMZCbP+/WaV2bO98w5zwnfDM98z3cRVcUYY4wxxkQfR6QDMMYYY4wxnWOJnDHGGGNMlLJEzhhjjDEmSlkiZ4wxxhgTpSyRM8YYY4yJUpbIGWOMMcZEKUvkwkBEfCKyvtHjByfx3vNF5P+6eP53RWRKJ9/7tIh8sYvnd4jIIyKySUQKRGS1iAzryjFN6PT2+ho8zigRWSAiO0Vkq4i8LCL9unpc0/2svoKIjBWRJSLyiYjsEpGfioj9/95LuCIdQC/hVtVJkTixiDgjcd5mrgUGABNU1S8ig4CqCMdkWter66uIJAJvAP9PVf8d3DYTyAEORTI206LeXl+TgNeBW1X1LRFJBuYD3wV+G9HgTFhYxh5BIrJbRH4uIstFZI2InC4ii4LfqG5ptGu6iLwmIltE5E/137RE5PHg+zaLyE+bHffHIvIhcE2j7Q4ReUZE7hcRp4j8Mtg6tlFEbg7uIyLyh+C53gByu+FS84AiVfUDqOo+VT3WDcc1YdSL6ut1wPL6JA5AVZeq6qZuOLYJk15WXz9S1bcAVLUauA34fjcc20QBa5ELjyQRWd/o9YOq+vfg80JVnSEivwWeBj4HJAKbgT8F95kGnAbsARYCVwH/AH6oqiXBb4WLRWSCqm4MvqdGVc8GCH5ouYDngU2q+oCIzAXKVHWqiCQAH4nIW8BkYDQwHugHbAH+2vyCROT7wPUtXOv7qvqdZtteBj4UkXOAxcDfVHVdO78zEzm9vb6OA9a2+1syPUVvr69jaVZfVXWXiCSJSKaqlrb8azOxwhK58Gir6f/14M8CIFVVK4AKEakRkcxg2SpV/RRARF4EzibwQfOl4AeGi0Cr12lA/QdN/QdZvSeAl1X1geDri4AJcrx/RgYwEjgXeFFVfcABEVnSUtCq+kvgl+1eeWDffSIyGpgVfCwWkWtUdXFH3m/CrlfXVxN1ent9FaCltTalg+83Uc4SucirDf70N3pe/7r+36f5H6lKYLDA94CpqnpMRJ4m8E2zXvM+aMuAmSLya1WtIfBHfruqLmq8k4jMaeF8JzjJb4yoai3wJvCmiBwCriTQOmeiS2+or5uB89o7pokKvaW+ntvs/cOBYmuN6x2sj1x0mCYiw4J9N64FPgTSCXyYlElgNN2l7RzjSWAB8IqIuIBFwK0iEgcNo/RSgPeBLwf7eOQBM1s6mKr+UlUntfA4IYkL9k0ZEHzuACYQuI1hYlNU11fgBeAsEbmsfoOIXCIi40/ml2CiRrTX1+eBs0VkdvBcScAjwE9O7tdgopW1yIVH8z4cC1W1w0PkgeXAQwT6VbwPvBYc/bmOwLexT4GP2juIqv5GRDKA5wh82xsKfCwiAhwh0Er2GoHbnwXAJ8B7JxFna3KBPwf7igCsAv7QDcc1odGr66uqukXkcuB3IvI7oI7ALbXvdvXYJiSsvop8AXhURB4DBgL3q+rzXT22iQ6i2m4rrzHGGGOigIhcCfwGmKmqduejF7BEzhhjjDEmSlkfOWOMMcaYKGWJnDHGGGNMlLJEzhhjjDEmSlkiZ4wxxhgTpUI2/YiI/BW4HDisquNaKBfg98AcoBq4SVU/bu+4l1xyiS5cuLC7wzWmPZ2eJd3q7Mnx+fwsWVPI26v2cqTUTU5mEhdOy2fW1HycDpus/iR06pcVyvqqfh8VG9+lYsNivOVHcaX3JW3iBaRNOB9xRHz9+U6JxWuKEPvj7qRQziP3NIG5wp5tpfxSAkuWjASmA48Hf7apuLi4m8IzJjysznacz+fn4efWsLygqGFbcambrbtLWL31EHfeMAWn024khFKo6qv6fRx69ddUb1/ZsM1XXkztvu1U71xLv6vuiLrEJxavyUSfkCVyqvq+iAxtY5crgGc1MP/JChHJFJE8VS1q4z3GmEZipfVKVfH6/CxavrtJEtfY8oIilq4tZPa0IWGOznSHio3vNkl4GqvevpLSj14lZfS0MEfVNZXbVrZ5TZUF75E2cVaYozK9TSRXdhgIFDZ6vS+4rUckckUv/Axv2WFcGbnkXffjSIdjzAlC3Xqlqni8fjx1Pmo9vsDP4OP4Nn8L21rbz3t8f48Pj7fp/h2Z0vKtlXstkYtSFRvaXlr52Psvcez9l8IUTXiUr19siZwJuUgmci01F7T4US4ic4G5APn5+aGMqaG/Q03hFtRbh7e8OPDHaP0dzEkIR51dsqawzdar3/99HcMHZlDrqU+q/E2Sq4bnLSZhgQSupzlS6o50CDEpHPXVW340JMftyTzF+/DX1eKIS2h/Z2M6KZKJ3D5gcKPXg4ADLe2oqvOAeQBTpkwJ2VIULfV3UG8dxW88Zv0dzEkJR519e9XeNsuXrt3H0rX7QnHqTnE5HSTEO0mIc5AQ5yI+LvA6Pi7wSIhzsvnTo5RXeVo9hrvGy2cHyhg2ICOMkce+cNRXV3pffOWt979zpvUlbfx5oTh1yFQUvIuvoqTVcq2pZO+jc0mffCHpZ1yCKz07fMGZXiOSidzrwG0i8hKBQQ5lke4f114fjqOLniTl1Bk4ktJwJqXhSE7D4YoPc5SdY7eKY093tE7Fu05MphLinI22OUioL4s/Xt7y/scTtOb7x8U5O9Rn7+2Ve3jk5fWtllfV1PHd37zLBVPy+c9Lx9A3I6nLvwMTHmkTL6B23/ZWy7PO+3LU3YZ09elP8RuPtbmP311J6bLXKF3+L1LGTCdj6uUkDBpNYOIGY7oulNOPvAicD2SLyD7gJ0AcgKr+CVhAYOqRnQSmH/laqGLpqPb6cJR/vIjyjxc12SZxiTiT05okd86kNJxJ6TiSUoNl6YFtwf0kLiHsf8TessPUlfSI7oemm+RkJlHcRjI3KDeVW66a0DQxcx1PuuJdThw9bEDErKn5rN56qMVbxsmJLqprvKjCO6v38v76/Vxx7nC+OGskyYlxEYjWnIy0CedTvXNti1+Wk0dPJzXKWuOg7WtKHDaR+JzBVG5Ygr+2GtRP1dblVG1dTnz/4WRMvYzU0z6HuKzumq4R7UgP4x5kypQpumbNmpAce8+jN7fZ9N9dxBnXkPA5ktNxJqUGE8H0pklhMPlzJqUhCcmdSv7q+/wdXfRn1FuHuOLoe/G3rM/fyet0xhOqOtte69V3r50UlQMDfD4/S9cW8tbK4yNxL5qez3mnD+aD9ft4bsFWistqGvbPSI3nKxeO5uIZQ3HZ1CSNdarOhvIzVv0+Kgveo3z9YrzlxbjSs0mfdAGp48+L2s+j9q7J73FTsfE9yte8Qd3Rpr2HnCkZpE2+iPTTL8aV1idCV9Bj9KxvlVHEErlG9j9zd5tN/64+/Uk//SJ81eX43ZX43BX4q8sDP90V+KorQP0hiQ2HM5jcpeJMTm9I9k5M/I6XSUIih1/7bavfgK3P30npcYmcz688/OzqFluvZozP484bp0bVFCQdVVvn4/X3d/HK4h24a70N2wfmpPDVy8Zy5rj+dtsqoMclcr2Zqh/3pxsoW/0G7l3rmhY6XKSedhbpU+aQOHBkZAKMPPuj7SRL5BopX7+4zf4OOZd/u80+HKqK1lbjCyZ1fnd5o+cVwcQv+NNdjq+6Ep+7HHzeVo/ZNUIrA4GB9q/HNNHjEjk43nr1+PyNeLx+4l0Obr16AjOnRNc8cp1RVlnLS29v581lu/H5j9fz04Zl8fXPj2X0kKwIRtcjWCLXQ3mOHqB8zQIqNi5FPTVNyhIGjiJj6hxSxsxAnJHsxh52sf2BFUKWyDXS0qjVeqFqwVJVtK6maZIX/NlSi199Qqh1tV0+d8KgMQz86gPdcBW9Qo9M5Ord88QyDpdUk5uVzH03nxXSc/U0B45U8syCLSzb2LRl8uyJA7hxzmnkZadEKLKIs0Suh/PXVFGxcSllqxfgLT3UpMyZmkX6GReTPvlCnCm9YpS2JXKdZIlcM/X9HYoXzmvoU5Z9ydwe14fDX1d7/PauuyJ4uzeQ7NVvq9y6HHx1rR7DmZ7NkNufCGPUUa1HJ3IGtn5Wwl//vYlte441bHM5hTlnDePaC0eTnhIdI8y7kSVyUUL9Pqp3fkz56jdw7y5oUibOOFLGnkPG1Dkk9B8WoQjDwhK5TupV7bYdIQ4naRNnUbrsVepKinClZ/fI24+OuAQccQm40vu2uk9d6aE2+/zh96F+X49KUI3prFOHZfGL289hWUERz7yxhaLiKrw+5fUPPmXx6r1cc8EoPn/OcOLjrL6bnkUcTlJGTSVl1FQ8R/ZStnoBlQXvoV4P6qujcuMSKjcuITH/NNKnziFl1LSwfG7btFXRwYZ4tcKVkUtcVh6ujNxIh9JpaRMvaLPcV3mMQ//4JX6PzZZvYoOI8LkJA/jj92cx98rxpCUHWuGqarw8/cYWbnl4MUvXFuL3R9edCNN7xOfkkzPnFvK/M4+sWTc0mUS4Zu8WDs//FYV//C9Kl/8Tn7sipLHUT1vlLTsc0vOYrrFbqzGsrT5/iKNhhG18v2H0/9JdbbbuGbu1Go2q3HX8Y8kOXn9/Fx7v8RHlpwzK4GuXj2XiyJwIRhdydms1BqjfR9UnqyhfvYCavVualIkrntTx55ExdQ7xOd2/tFrh47dRV1JEXFYeg2/9Q7cfvxm7tdpJlsjFuNb6/MXlDObQK7/AVxlYXsaZ2of+X7qLhLxTIhxxj2WJXBQ7cszN3xZuZenaQhp/5E05tR83XX4aQ/qnRy640LFELsbUHvyUstVvUrX5A7RZ/+ekoeNJn3oZySNO77bbrpbIRQe7tRrj6vv81TfP1/f5SxwwkoFfe4j4/sOBwG3WA8/+iKptKyIZrjEhkdMnif/5yun87n/OZ1KjVrg1Ww/xnV8t5dGX11NSXtP6AYzpARL6Dyf3898m//Yn6HPeV3CmHp9ix727gEOvPETh47dTtur/8NdURTBSE06WyPVirvS+DLjhPpJHTQNAvR4Ozf8lpcteI9paao3piOEDM/jZzTO491tnMjQv0ArnV3hr5R7mPvgOzy/c1mSS4e50zxPLuPnBd7jniWUhOb7pPZwpGfQ5+4vk3/Y4uVf+DwkDRzWUeUsPcfTtp9jz6FyKF/0FT7PVJEzssVGrvUT9oI3mgzcc8Yn0++L3KVn6PGXL/wlAydK/UVdygOxL5yJOWwfQxBYR4Ywx/Zg0Kpela/by3JvbKCmvodbj46W3t7NwxW6uu3gMF03Lx9mNS34dLqnmQLG1kpjuI04XqWPPJnXs2dTs30H5mgVUblkGfi/qqaF8zZuUr3mTpFMmkzH1MpKGT0TE2m9ijSVyvURbQ8dFHPSddQNxWXkUvzkP/D4qNiyhrvQQ/a7+Ps6ktDBGakx4OB3C7GlDOHvSQP71/i7mL9mBu9ZHaUUtj/1jA//+YBc3XTaWqaf1syW/TI+XOHAkiQO/S9asGyn/eBEV697CV1UGgHvXOty71hHXdwDpU+aQNuF8HPFJEY7YdBdLzU2D9EmzyfvKPTgSUwGo2bOZA0/fRV2JNc2b2JUY7+La2aOZd9eFzDlrKI7g0maFhyq5768rufvxj/hk77F2jmJMz+BK60PWeV8m/7YnyPn87Q39oAHqjh7g6KK/sPeRuRx952nqjh2MYKSmu1giZ5pIGjqeATc9iKtPfwDqSorY/9RduPdsinBkxoRWZloCt149kT9+fyZnjuvfsH3TrqPc8fv3+eVzazh41G6NmuggrjjSJpzPwK//ggE3PkDKqWcFpp0C/LXVlK38N4WP3cbBVx7CvbvA+kVHMUvkzAni+w5g4E0PkZh/GgD+mkqKXriPig1LIhyZMaE3KDeNH35tOg99+2xG5/dp2P7++v3c+vASnnx9E5XVnghGaEzHiQiJg8fQ76o7yL/tcTLPugpHQ3cZpfqT1RQ9fy/7/vz/KF/3Nv66WtTvo3z9YrzlxQB4y4spX78Y9fsidyGmVTaPnGmV+uo4suAJKjcubdiWMeNKsmZe3xs7zNo8cr2QqvLhhgM8u2ALB49WN2xPTYrj2gtHcdnnhhHn6ticXTc/+A4HiqsYkJ3CE3fNDlXIjdk8cqZF/rpaKjd/QPnqN/Ac3tukzJGYiiM5DW9J0QnvSx49nX5X3RGq5cGsI2on9br/jU3HiTOOnMu/TdbM6xu2lS3/J4fm/wq/x+bcMrFPRDhn0kAe+99ZfPOKcaQlB0ZxV7rrePL1zdzy8BLe+3ifLfllooojLoH0SbMZ+M3fkHf9vcEpqAJ5lL+mssUkDqB6+0oqC94LY6SmIyyRM20SETLPuorcq7+HuALrVlZvX8mB536Mt6IkwtEZEx5xLidXnHsK8+6+kKvOH0GcK/DRebikml89v5bvPfI+BbuKIxylMSdHREgaOp7+19zJ4G//kYzpX2joR9ea8vWLwxSd6ag2/8VEpEJEylt7hCtIE3mpY2aQd8N9OFMyAfAc3MX+p+6k9uCnkQ3MmDBKTYrja58fy5/uvIDzzxjUsH1HYSl3P/YR9z25ksJDoV3I3JhQiMvsR9/ZX8WR2qfN/er7zZmeo81ETlXTVDUd+B3wA2AgMAi4E7g/5NGZHiVxwAgGfv1h4nOHAuCrKOHAs/dQ9cnqyAZmTJjlZiVzx3Vn8Nv/Po8JI7Ibtq/acpDbfrWUP/5jA8dsyS8TheIystssr1/u0fQcHb21erGqPqaqFaparqqPA1eHMjDTM7nSsxlw4/0kjzgDAK2r4dArD1O64nUbvm56nRGDM7n/lrP48TemM7hfYCSg368sXL6buQ++w4tvbafK7eHtlXsoLnUDUFzq5u2Ve/BZvzrTA6VNvKDN8vRJbZeb8OtoIucTketFxCkiDhG5HrBxyL2UIyGJftfcSca0y4NblJLFz1D85hOoLzTrVBrTU4kIU0/rz6N3nM9t10ykT1oCADUeHy8s2sYN9y7ikZfX4/H6AfB4/Tzy8noefnY1Pp8/kqEbc4K0CeeTPHp6i2XJo6eTOv68MEdk2tPRRO464EvAoeDjmuA200uJw0nfC79G9iVzGzrHVqx7m4Mv3Y/PXRnh6IwJP6fTwcVnDuWJu2Zz3UWjSYwPTNFQ5205WVteUMTStYXhDNGYdonDSb+r7iDn8m8jrsAobXEFZjAI4dQjpgs6lMip6m5VvUJVs1U1R1WvVNXdIY7NRIH0My6m/5d/hCMhGQD37gIOPHO3Lf1ieq2kBBdfuXgM8+6aTWZqQpv7vrVyb5vlxkSCOJykTZzV0B/OlZ5N2sRZlsT1UB1K5ERklIgsFpFNwdcTRORHoQ3NRIvk4RMZ8NWf48rMBaDu6H72P/UD3Hu3RDgyYyKnT3oiLlfbH7FHgv3mjDGmszp6a/XPwF1AHYCqbgS+HKqgTPSJzxnMwJseImHQGAD87gqKXvgpFQXvRjQuYyIpJzOpS+XGGNOejiZyyaq6qtk269VumnCmZJB3/U9IHXduYIPPy5HXH6Xk3RdQtU7dpve5cFp+m+UXTW+73Bhj2tPRRK5YRE4BFEBEvgi0vIaH6dUcrnhyvvAd+px7vMG29KP5HH7tt/jraiMYmTHhN2tqPjPG57VYNmN8HjOnWCJnjOkaVwf3+zYwDxgjIvuBz4Dr236L6a1EhD7nXENc3wEcef1R1FdH1dZleMuO0O+aO3G1M3O4MbHC6RDuvGEKS9cW8vj8jXi8fuJdDm69egIzp+TjdNg64caYruloi9weVZ0N5ABjVPVsVd0TwrhMDEg97XPk3fAznCkZANQe2MGBp36A57BVHdN7OJ0OZk8bQnawP1x2ZhKzpw2xJM4Y0y06msh9JiLzgDMBmyTMdFjiwFEMuOkh4nICt5C85cXsf+ZuqneujXBksaXohZ9R+PhtFL3ws0iHYowxJow6msiNBt4hcIv1MxH5g4icHbqwTCyJy8xl4FcfIGn4ZADUU8PBlx+ibPWCCEcWO7xlh6krKcJbdjjSoRhjjAmjjk4I7FbVl1X1KmAykA68F9LITExxJCTT/9q7SJ9yaWCD+jn61pMUL/wz6rfV3owxpqdxZeQSl5WHKyM30qGYNnR0sAMich5wLXApsJrAkl3tvecS4PeAE/iLqj7UrPx84F8EBk8AvKqqdm8oRonDSfbF3yQuawBH334K1E/52oXUHTtEv//4HxyJKZEO0RhjTFDedT+OdAhRK5jffE9VL29n1y7rUCInIp8B64GXge+ralUH3uME/ghcCOwDVovI66rafLr/D8JxoabnyJg6h7g+/Tn02m9Qjxv3p+vY/+wP6f+lu4nLtG9+xhhjOu7zd/zLBdwIfAMYDBQCTwLP/PvXV8T8LZ+O9pGbqKr/oaovdiSJC5oG7FTVT1XVA7wEXNGpKE3MSR5xOgO/+nNcGTkA1B0p5MDTP6Bm3/YIR2aMMSZaBJO4vxNI3M4ikMidFXz9crC8U0RkqIhsE5G/iMgmEXleRGaLyEciskNEpgUfy0RkXfDn6BaOkyIifxWR1cH9ujUXajORE5H/DT59QEQeaf5o59gDCWTF9fYFtzU3Q0Q2iMibIjK246GbaBefm8+Amx4iYcBIAHxVZRT97SdUbv4wwpGZnsBG4hpjOuBG4KpWyq4Cbuji8UcQ6CI2ARgDXAecDXwPuBvYBpyrqpOBHwM/b+EYPwSWqOpUYCbwSxHptr5E7WWqW4M/13Ti2C1NkqTNXn8MDFHVShGZA/wTGHnCgUTmAnMB8vNtJvRY4krNJO8/f8qR//sjVVs+Qn11HP7nb6krOUDm2dcgEp1zbVmd7br6kbgm9Ky+mij2jQ6UP9WF43+mqgUAIrIZWKyqKiIFwFAgA3hGREYSyHHiWjjGRcAXROR7wdeJQD7Hc6wuaTORU9V/B59uVNV1J3nsfQSaOOsNAg40O355o+cLROQxEclW1eJm+80jsLIEU6ZMaZ4MmijniEsg98r/5lhWHqUf/gOAY+//nbqSIrIvuxWHK77F9xW98DO8ZYdxZeT2uE65VmdNNLH6aqLY4HbKu/rNpPHakv5Gr/0Ecqj7gKWq+h8iMhR4t4VjCHC1qoak71BH+8j9Jnif+L6TuP25GhgpIsNEJB74MvB64x1EpL8Em1xEZFownqMdPL6JISIOss77Cjlf+A44A98vKje9T9Hz9+KrKmvxPTZ3mjHG9HqF7ZTvDfH5M4D9wec3tbLPIuD2RvnO5O4MoKPzyM0EzgeOAPNEpEBEftTOe7zAbQQuYCvwsqpuFpFbROSW4G5fBDaJyAbgEeDLqtojvg3e88Qybn7wHe55YlmkQ+lV0safx4Dr78WRnA5A7b7t7H/6B3iOtPe3akzPlpuVzIDsFHKzkiMdijGx5MkulnfVL4AHReQjAlOtteQ+ArdcN4rIpuDrbtPh0RyqehB4RESWAv9LoFPf/e28ZwGwoNm2PzV6/gfgDycTcLgcLqnmQHFHB+ia7pQ4+FQG3vQgB//+c+qO7sdbepj9z9xNv6u+R/LwiZEOr0dRv4+Kje/iLQ/0RvCWF1O+fjFpE85HHK19pphIuO/msyIdgjGx6BngMloe8PAq8GxnD6yqu4FxjV7f1ErZqEZvuydY/i7B26yq6gZu7mwc7elQi5yInCoi9wYzyT8Aywj0eTMmJOL69GfATQ+SNGwCAFpbzcGX7qd87aIIR9ZzqN/HoVd/TfEbj6HeusA2bx3FbzzGoVd/bStmGGNiXnCeuGuBrwMfEbjV+lHw9Zd6wzxyHW2Rewp4EbhIVQ+0t7Mx3cGZmEL/a39I8aInqVj3Fqif4oXz8BzdR1xOfq9vharY+C7V21e2WFa9fSWVBe+RNnFWmKMyxpjw+vevr/ASyFO6Mjo1arXbIhdcoWGXqv7ekjgTbuJ0kX3pXLJm30T9jDblqxdwdMGfen0rVMWGxW2Wl61dGKZIjDHGREq7iZyq+oC+wZGnMc/n8/P2yj0Ul7oBKC518/bKPfj8PWIMRq8kImRO/zz9rrmzYURrS+pboXoLb3nbA7w9RbvY//RdlK74F3Wlh8IUlTHGmHDq6K3VPcBHIvI60DACQFV/E5KoIsTn8/Pwc2tYXnB8ElKP188jL69n9dZD3HnDFJzOjs7YYrpbyqipxPUdSN3hPa3uU75+ca+5nehK74uvvLjNfWr3f0Lt/k8oWfws8f2GkTLmTFLGnEl8tnVxNcaYWNDRRO5A8OEA0kIXTmQtWVPYJIlrbHlBEUvXFjJ72pAwR2Ua89e0PZLY205iE0vSJl5AbRtr08b1HRhYGUH9AHgOfYbn0Gcce+9F4rIHBZO6GcTnDonaFTSiSU+ewNoYE706lMip6k9DHUhP8PaqtucNfGvlXkvkIqy9VihXenYYo4mstAnnU71zbYsDHpJHT6ffVXfgr6mi6pNVVG1bgfuzAvB7Aagr3kfph/+g9MN/4OrTP5DUjT6ThAEjLKkLEVtyzJjoIiLfAW4FPlbV60Nw/HuBSlX9VVeO06FELjh33AmdxFQ1pu5hHQn2i2vNJ4XHePGt7cwYn8eQ/mn2H14EtNcKlT7pgjBGE1nicNLvqjuoLHiP4oXzUG8d4ooj+5K5pI4/D3E4cSankz5pNumTZuOrqaJ6x5pAUvfpetTrAcB77CBly/9J2fJ/4kzrS8qY6aSMOZPEQWN61ShgY0x0+vSBq13AjQTWVR1MYAqSJ4Fnhv9wfldGwP0XcKmqftb1KEOno7dWv9foeSJwNeDt/nAiKyczqWGQQ0t8PuWFRdt4YdE28vqmcOb4PGaMy2P0kD44HJbUhUN7rVCp48+LQFSRIw4naRNnUbrsVepKinClZ7faR9CZmELa+PNIG38efo+b6l3rqNq2guqda1FPDQC+iqOUr15A+eoFOFMySR41jZQxZ5I0ZCzSxkATY4yJhGAS93eaTgg8GDgLuOzTB66+dvgP5590viIifwKGA6+LyEvAKcB4AnnTvar6LxG5CbiSwIoO44BfA/HADQTWZJ2jqiUi8i1gbrBsJ3CDqlY3O98pwB+BHKAa+JaqbutIrB29tbq22aaPRCTmhgdeOC2frbtLWi13OqRh9GrR0Spee3cnr727kz5pCUwfF0jqxo/IJs5lAyJCpSOtUKZ9jvgkUk89i9RTz8Lv9eD+dEMgqduxuqEfoq+qlIp1b1Gx7i0cSakkj5xKypgzSR42EXHFRfgKjDEGCLTEtbSqA8HtN9CJ+eVU9RYRuQSYCfw/YImqfl1EMoFVIvJOcNdxwGQCjVw7gTtVdbKI/DYY2++AV1X1zwAicj+BlsNHm51yHnCLqu4QkenAY0CH7np29NZqVqOXDmAK0L8j740ms6bms3rroRYHPMwYn8ftX5rEuu2HWV5QxNpth3DXBlpsj1XUsnD5bhYu301Koospp/Znxvg8Th+TS1KCtWJ0t5NphTLtc7jiSRk1lZRRU1FfHe7dm6jatoKqT1bhry4HwO+upHLjUio3LkXik0geeUYgqTvldBxxCRG+AmNML/aNDpR3daLgi4AviEj93clEID/4fKmqVgAVIlIG/Du4vQCYEHw+LpjAZQKpBNagbyAiqQRaEF9p1GWrwx+sHc0y1nK8j5wX2E37v7yo43QId94whaVrC3l8/kY8Xj/xLge3Xj2BmVPycTqEcycP4tzJg/DU+diw4wjLC4pYteUgZZWB/kZVNV7eW7eP99btI87lYPKoXGaM78/U0/qTkWr/4ZmeTZxxJJ8ymeRTJpN96VxqCrcGkrptK/FVBlqr1eOmavOHVG3+EHHFk3TKZFLHzCB55Bk4EmxBeGNMWA1upzy/nfKOEOBqVW3SQTvYclbbaJO/0Ws/x3Osp4ErVXVD8Hbs+c2O7wBKVXVSZ4JrM5ETkalAoaoOC77+KoH+cbuBLZ05YU/ndDqYPW0IryzewYHiKrIzk1ocqRof52TqaYEEzedXtn52lOWbilhRUMThY4F+dnVeP6u2HGTVloM4BMYOz+bM8f05c1weuX3sPzzTs4nDSdKQcSQNGUffi75O7f4dwaRuBd6ywwCo10P19pWBPotOF8nDJgZa6kZOxZkcszMVGWN6jkLaTubano6iYxYBt4vI7aqqIjJZVdedxPvTgCIRiQOuB/Y3LlTVchH5TESuUdVXJNAsN0FVN3Tk4O21yD0BzAYQkXOBB4HbgUkE7ud+8SQuJGY5HcK4U7IZd0o23/zCOD7dX8byTUWs3HSQ3UXBW1MKBbuKKdhVzJ//uYkRgzI4c1weZ47PI7+fjYA1PZuIg8RBo0kcNJqsC27Ec/AzqrYtp2r7CuqOBlfu83kDA1F2rgVxkDR0HCmjzyR59DRcqX0iewERpH4fFRvf7fVrAxsTIk8SuC3ZVnlX3Uegr9vGYJK1G7j8JN5/D7CSwOIKBbQ8H+/1wOMi8iMgDngJ6JZEzqmq9b3/rwXmqep8YL6IrO/ICXobEeGUQZmcMiiT/7zkVA4UV7Ki4CArNhWxbU8JGrxBvXNfGTv3lfG3hdsYkJ3CjPGBpG7UYBsBa3o2ESEhbzgJecPJmnk9niOFDS11nsO7AzupH/dnG3F/thEW/pnEwWMaVpXoTXP9qd/HoVd/3WSUdf3awNU719LvqjssmTOma54BLqPlAQ+vAs929sCqOrTRy5tbKH+awG3TE/ZvXKaqjwOPt/D+exs9/wy4pDNxtpvIiYhLVb3ABQSGz3b0vQYYkJ3KVTNHcNXMEZSU17By80FWFBSxcecRvL5AVneguIr5S3cyf+lOstITmT6uf8MIWJctCdYqV0Zuk58mMuJzBhOfM5g+51xDXUkRVdtXUrVtBbUHdgT3UGoKt1JTuJWjbz9FwoCRDUldXJ+YGzPVRMXGd1ucKgeOrw1sA3WM6bzhP5zv+/SBq68lMDr1GwT6xO0l0BL3bBfnkYsK7SVjLwLviUgx4AY+ABCREUBZiGOLqNys5CY/u0NWeiKXzhjKpTOGUumuY83WQ6wIjoCt8QTqWkl5DW8u282by3aTkhTH1NP6MWNcHqePziWxCyNg73liGYdLqsnNSua+m9tqhY4etszRcT0lqY3LyiNzxpVkzrgSb3lxQ0tdTeE26sdL1R7YQe2BHZQseY743KHH13/NCXRz6Qm3ItXvQz01+D1u/LXuwE+PG62t31aN31ODeo6X+WvdgdcN+9fgq2h9OiPoXWsDGxMqwXninqLro1OjkqiesGBD0x1EzgTygLdUtSq4bRSQqqofhz7EpqZMmaJr1qwJ92lDqrbOx4ZPAiNgV24+SEW154R94l0OJo/OZcb4PKaN7U9acvxJnePmB9/hQHEVA7JTeOKu2d0Vem/S6fvdsVhnT5a38hjV21dRtX0F7t2bGtZ/bSyu70CSR0+jdv8OavZsOqG8ftmx1pI59dYFEyz38SSr1t0kGVNPs9e1gX3rk7P65E3rals8R3dzpmcz5PYnQnX4TtVZq68mQqxPUSe128Sjqita2PZJaMLpnRLinEwb259pY/vj8/nZ8llJYATspiKOBEfAerx+Vm4+yMrNB3E4hHHD+zJjfB7Tx+aR0ycpwldgTNtcqX1IP+Ni0s+4GF91BdU7VgcmIP5sA/iC678e3U/ZstdaPUb19pXsf/ounElpLSRmNQ3ryEaUw4UjIRFHfBKOhCTqSg83rJrRkt7UX9AYExrWz62HcTodjB+RzfgR2XzrinHs2hcYAbu8oIjCQxUA+P3Kxp3FbNxZzBOvFTBicCYzxuUxY3weg/vZlA+mZ3Mmp5E2cRZpE2fhr6mieufHVG5bjnvXuob1X1vjKdrV7fFIXEJD4iXxSYHn8YlIQhKO+OSGxKyhLCGp2f6JOBKSA/s0W/GifP1iit94rNVz96a1gY0xoWGJXA8mIowYnMmIwZnccOmp7D9SyYqCIpZvKmL7nmMN++0sLGVnYSnPvbmVgTmpzBgfSOpGDs7E71eWrClsWEO2uNTN2yv3MGtqYIJjYyLJkZhC6rhzSB13Dn5PDXv/cAt+d0XbbxJHINFqMak6vu2EJKu+pSw+KbgtEYlPDGm/O1sb2BgTau32ketprP9GwNEyNys3H2R5QREFO4sb1oBtrG9GIi6ng0Ml1SeUzRifx503TMEZxaNiwzyAw/rIhcH+Z+6mdt/2VsvjB4xi4E0/j6p5F9Xvi9TawNZHznRJtHzG9nbWIhel+mYkMeesYcw5axiV1Z6GNWI/3n6Y2uAI2KNlrffNWV5QxK+eX8u44X1JiHeRmOAkMd5FQpyThHgnifHOwPb4wOuEOGeP+c/T5/OzZE0hWz49isfrt1bGGJI28YI2E7mM0y/sMfWwo2xtYBOtDpdUc6C4KtJhmHZYIhcDUpPjmXnGYGaeMZgaj5f1wRGw7328r8WWunofbjjAhxsOdPg8DQleXNMkLzHe1aQssXEC2MJ+gX0avT6JRNHn8/Pwc2tYXlDUsM3j9fPIy+tZvfVQ1Lcy9nZ2K9IYY06OJXIxJjHeFVj6a1weG3YcabNV7mTVenwNrX3dTYSG1sCG5C+uabKXGO/kyDE3G3cWt3iM5QVFLF1b2OLauCY6iMNJv6vuiNStSGOMiTqWyMWw3D7JbSZy+f3T+NrlY6nxeKn1+Kjx+Kht9LzG46W2rn6778T9GpV1lSrBc/qAtkcutuWtlXstkYtydivSGGM6zhK5GHbhtHy27m59Zvn/OO8Uppzar8vn8fsVj9fXJMmrT/ACyZ6XmtrA88ZlNfXP67ovUTwSHJ1rjDGmc+r7IdtsB9HBErkYNmtqfsMgiOZmjM9j5pT8bjmPwyHBfnEuMrrliE01ThR/Mm85u/a3vjpcTqZNjmyMMZ1l/ZCjj/1rxDCnQ7jzhil899pJxLsC/9TxLgffvXYSd944NWq+WdUnihmpCVz2uWFt7nvR9O5JTo3pbq6MXOKy8iK+Hq4xbVmyprDFL/9wvB+y6VmsRS7GOZ0OZk8bwiuLd3CguIrszKSo7kMWrlZGY7pb3nU/jnQIxjTw1Pk4UurmcEk1h4+5OXKsmsPHqlm56WCb77N+yD2PJXK9RG5WcpOf0aq+lXHp2kL+8q9NuD0+kuKdfPOKccycYv03jDEGoLqmjiPH3Bw+Vt2QrB0+Vt2w7VhFbaeOa/2Qex5L5HqJMMzKHTb1rYz2rdAY0xupKuVVnuOJWqMWtcMlgW2V7rpOHVskMItAa6wfcs9jiZwxxhjTg/j9yrGKmoZE7VBJ9QlJW00np31KTnSR2yeZnD5J5PZJDjyykhq2rd58kEdf2dDq+60fcs8T0kRORC4Bfg84gb+o6kPNyiVYPgeoBm5S1Y9DGZMxxpjIqJ/W4u1VezlS6iYnM4kLp+VH9bQWnbkmry+wtGDz5Kw+YTtS6sbr83cqnvSUeHKzksntk9QkYeuXlUxOn2RSk+LafP8F04awZtth64ccRUKWyImIE/gjcCGwD1gtIq+r6pZGu10KjAw+pgOPB38aY4yJIS1Na1Fc6mbr7pKondairWt6f/1+rjh3OMWlNU36ph0+5qakzE0bqye2SgSy0hObtqg1Ttoyk0hM6Np/6437IT8+fyMer594l4Nbr55g/ZB7qFC2yE0DdqrqpwAi8hJwBdA4kbsCeFZVFVghIpkikqeqLY99NsYYE5Xam9biO795l77piWGOqmuOltWw91BFi2XrPznC+k+OnNTxnA4hOzMp2HpWf+sziZxgi1rfjCTiXKFPdmNttoNYF8pEbiDQeMKZfZzY2tbSPgMBS+SMMSaGvL1qb5vlew9WsPdgy0lRrIiPc55wy7Nxi1qf9ERr8TInLZSJXEu1sXljckf2QUTmAnMB8vPt/rzp+azOmmgSjvrakWkrXM7oSmK8vrbvjyYnuvjOlyaT0yfQypaeEk+ga7gx3SeUidw+YHCj14OAA53YB1WdB8wDmDJlSid6FhgTXlZnu65+BQRbCSH0wlFfczKTGtbubMmpQ7P4xe3nhOLUIfO/j37Q5nrWQ/qn87mJA8IYUfeKlflHY10oE7nVwEgRGQbsB74MXNdsn9eB24L956YDZdY/zhgDthJCrLlwWn6bSU80TmsRi9fUWCzNPxrLQtZrUlW9wG3AImAr8LKqbhaRW0TkluBuC4BPgZ3An4H/ClU8xhhjImfW1HxmjM9rsSxap7WIxWsy0Ue0rSmceyAROQLsCdPpsoHiMJ0rHGLteiB811Ssqpd05o1hrLP279vzhfN6OlVnQ11f45Mz+sYnpmV7Pe4kV3yS21NTUeypLjsaqvOFQyxeUyM9/jO2t4u6RC6cRGSNqk6JdBzdJdauB2LzmjorFn8XsXZNsXY9XRGLvwu7JhMJ0TX7ojHGGGOMaWCJnDHGGGNMlLJErm3zIh1AN4u164HYvKbOisXfRaxdU6xdT1fE4u/CrsmEnfWRM8YYY4yJUtYiZ4wxxhgTpSyRa0ZE/ioih0VkU6Rj6S4iMlhElorIVhHZLCLfjXRMXSUiiSKySkQ2BK/pp5GOKVJirc5afY1tsVZfIfbqrNXX6GK3VpsRkXOBSuBZVR0X6Xi6g4jkAXmq+rGIpAFrgStVdUuEQ+s0CSxYmKKqlSISB3wIfFdVV0Q4tLCLtTpr9TW2xVp9hdirs1Zfo4u1yDWjqu8Dra+5EoVUtUhVPw4+ryCw0sbAyEbVNRpQGXwZF3z0ym8lsVZnrb7GtlirrxB7ddbqa3SxRK6XEZGhwGRgZYRD6TIRcYrIeuAw8LaqRv01maasvppoEyt11upr9LBErhcRkVRgPvDfqloe6Xi6SlV9qjoJGARME5GYuE1jAqy+mmgTS3XW6mv0sESulwj2c5gPPK+qr0Y6nu6kqqXAu4Ct0xcjrL6aaBOrddbqa89niVwvEOy4+iSwVVV/E+l4uoOI5IhIZvB5EjAb2BbRoEy3sPpqok2s1Vmrr9HFErlmRORFYDkwWkT2icg3Ih1TN/gccAMwS0TWBx9zIh1UF+UBS0VkI7CaQB+O/4twTBERg3XW6msMi8H6CrFXZ62+RhGbfsQYY4wxJkpZi5wxxhhjTJSyRM4YY4wxJkpZImeMMcYYE6UskTPGGGOMiVKWyBljjDHGRClL5KKQiPiCw9s3icgrIpLcxr73isj3whmfMY1ZfTXRxuqsiSaWyEUnt6pOUtVxgAe4JdIBGdMGq68m2lidNVHDErno9wEwAkBEbhSRjSKyQUSea76jiHxLRFYHy+fXf8sUkWuC3zw3iMj7wW1jRWRV8FvpRhEZGdarMrHK6quJNlZnTY9mEwJHIRGpVNVUEXERWNtvIfA+8CrwOVUtFpEsVS0RkXuBSlX9lYj0VdWjwWPcDxxS1UdFpAC4RFX3i0imqpaKyKPAClV9XkTiAaequiNywSaqWX010cbqrIkm1iIXnZJEZD2wBthLYI2/WcA/VLUYQFVLWnjfOBH5IPihcj0wNrj9I+BpEfkW4AxuWw7cLSJ3AkPsA8Z0gdVXE22szpqo4Yp0AKZT3Ko6qfEGERGgvebVp4ErVXWDiNwEnA+gqreIyHTgMmC9iExS1RdEZGVw2yIR+aaqLuneyzC9hNVXE22szpqoYS1ysWMx8CUR6QsgIlkt7JMGFIlIHIFviwT3PUVVV6rqj4FiYLCIDAc+VdVHgNeBCSG/AtObWH010cbqrOmRrEUuRqjqZhF5AHhPRHzAOuCmZrvdA6wE9gAFBD50AH4Z7GgrBD6sNgA/AP5TROqAg8DPQn4Rptew+mqijdVZ01PZYAdjjDHGmChlt1aNMcYYY6KUJXLGGGOMMVHKEjljjDHGmChliZwxxhhjTJSyRM4YY4wxJkpZImeMMcYYE6UskTPGGGOMiVKWyBljjDHGRKn/D5j4HJRj7x5uAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 640.74x158.4 with 3 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# embarked feature visualization\n",
    "embark = sns.FacetGrid(data_train, col='Embarked', size=2.2, aspect=1.2)\n",
    "embark.map(sns.pointplot, 'Pclass', 'Survived', 'Sex', palette='deep')\n",
    "embark.add_legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9dd9ab3d",
   "metadata": {},
   "source": [
    "* **Observations:**\n",
    "\n",
    "1.\tFemale passengers in ‘S’ and ‘Q’ unsurprisingly have a better surviving rate, but ‘Cherbourg’ is an exception. This could be a correlation between pclass and Embarked and in turn pclass and Survived, not necessarily direct correlation between Embarked and Survived.\n",
    "\n",
    "2.\tMales in class 3 have a better surviving rate than class 2 in ‘Q’ and ‘C’ points.\n",
    "\n",
    "* **Conclusions:**\n",
    "\n",
    "Embarked feature should be completed and sex should be added into model training.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "93ac3a9c",
   "metadata": {},
   "source": [
    "#### 3.4 Data Wrangling（数据映射）\n",
    "\n",
    "Now that we have collected several assumptions and conclusions through analyzing the data set, but we haven’t made any changes yet. As mentioned in section 3.2 before, we have to correct, complete, and also classify the data we’ve got. So in this section, we’ll execute the previously mentioned goals and make the dataset more applicable for model training.\n",
    "\n",
    "##### 3.4.1 Correcting Dataset: Dropping Features\n",
    "\n",
    "As previously mentioned, there are several features that have little correlation with the surviving rate,(like ‘cabin’ ‘ticket’ feature) so we could drop them out of the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "f897a79c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# data wrangling process\n",
    "# drop the 'ticket'&'cabin'feature out of dataset\n",
    "data_train = data_train.drop(['Ticket', 'Cabin'], axis=1)\n",
    "data_test = data_test.drop(['Ticket', 'Cabin'], axis=1)\n",
    "#drop the 'name'&'passengerid'feature out of dataset\n",
    "data_train = data_train.drop(['Name', 'PassengerId'], axis=1)\n",
    "data_test = data_test.drop(['Name', 'PassengerId'], axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98753c4c",
   "metadata": {},
   "source": [
    "##### 3.4.2 Converting Categorical features to numerical values\n",
    "\n",
    "Through this operation we can convert features which contain strings to numerical values, which is highly required for the following model training process."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "ee0ece2c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Survived  Pclass  Sex   Age  SibSp  Parch     Fare Embarked\n",
       "0         0       3    0  22.0      1      0   7.2500        S\n",
       "1         1       1    1  38.0      1      0  71.2833        C\n",
       "2         1       3    1  26.0      0      0   7.9250        S\n",
       "3         1       1    1  35.0      1      0  53.1000        S\n",
       "4         0       3    0  35.0      0      0   8.0500        S"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "combine=[data_train,data_test] # conduct operations in two datasets altogether\n",
    "for dataset in combine:\n",
    "    dataset['Sex'] = dataset['Sex'].map( {'female': 1, 'male': 0} ).astype(int)\n",
    "data_train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4bfe2aab",
   "metadata": {},
   "source": [
    "#### 3.4.3 Completing empty numerical values\n",
    "\n",
    "With previous observation we can note that age feature is correlated with other features, so an effective way of age guessing is to separate passengers into several groups and use median values as the guessing age. In this section, we use pclass and sex feature as auxiliary feature to predict the missing age value. So, we could have six different groups i.e., six guessing ages. We should set an empty matrix first to store the guessing numbers.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "6808ddc7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# guess the empty age values through pclass and gender\n",
    "guess_age = np.zeros((2, 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "79ea91b6",
   "metadata": {},
   "source": [
    "Then we can predict the passengers’ ages via median and fill the blanks."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "7c274b71",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 891 entries, 0 to 890\n",
      "Data columns (total 8 columns):\n",
      " #   Column    Non-Null Count  Dtype  \n",
      "---  ------    --------------  -----  \n",
      " 0   Survived  891 non-null    int64  \n",
      " 1   Pclass    891 non-null    int64  \n",
      " 2   Sex       891 non-null    int32  \n",
      " 3   Age       891 non-null    int32  \n",
      " 4   SibSp     891 non-null    int64  \n",
      " 5   Parch     891 non-null    int64  \n",
      " 6   Fare      891 non-null    float64\n",
      " 7   Embarked  889 non-null    object \n",
      "dtypes: float64(1), int32(2), int64(4), object(1)\n",
      "memory usage: 48.9+ KB\n"
     ]
    }
   ],
   "source": [
    "for dataset in combine:\n",
    "    for i in range(0, 2):\n",
    "        for j in range(0, 3):\n",
    "            # clear the null value in certain sex and pclass cluster\n",
    "            guess = dataset[(dataset['Sex'] == i) & \\\n",
    "                            (dataset['Pclass'] == j + 1)]['Age'].dropna()\n",
    "            age_pred = guess.median()  # find median number of certain group\n",
    "\n",
    "            # the guess age of certain group\n",
    "            guess_age[i, j] = int(age_pred / 0.5 + 0.5) * 0.5\n",
    "\n",
    "    for i in range(0, 2):\n",
    "        for j in range(0, 3):\n",
    "            # fill the missing age values\n",
    "            # according to the group of each passengers\n",
    "            dataset.loc[(dataset.Age.isnull()) & (dataset.Sex == i) & \\\n",
    "                        (dataset.Pclass == j + 1), 'Age'] = guess_age[i, j]\n",
    "            # turn the age value into integer\n",
    "    dataset['Age'] = dataset['Age'].astype(int)\n",
    "    \n",
    "data_train.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b84788f",
   "metadata": {},
   "source": [
    "Now that the age feature is complete, we can transfer the continuous age value into discrete value. The reason why this process is undertaken is that we can convert certain age numbers into a age band, as previously mentioned, the surviving rate has a lot to do with the age section that passengers belong to, so that means instead of a complex and irregular age number, the age group of each passenger is **a more definitive indicator of his or her surviving rate.** Thus, we divide the age feature into several 16-year-old age bands. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c8113311",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>AgeBand</th>\n",
       "      <th>Survived</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>(-0.08, 16.0]</td>\n",
       "      <td>0.550000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>(16.0, 32.0]</td>\n",
       "      <td>0.337374</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>(32.0, 48.0]</td>\n",
       "      <td>0.412037</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>(48.0, 64.0]</td>\n",
       "      <td>0.434783</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>(64.0, 80.0]</td>\n",
       "      <td>0.090909</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         AgeBand  Survived\n",
       "0  (-0.08, 16.0]  0.550000\n",
       "1   (16.0, 32.0]  0.337374\n",
       "2   (32.0, 48.0]  0.412037\n",
       "3   (48.0, 64.0]  0.434783\n",
       "4   (64.0, 80.0]  0.090909"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# surving rate in 5 age bands\n",
    "data_train['AgeBand'] = pd.cut(data_train['Age'], 5)\n",
    "data_train[['AgeBand', 'Survived']].groupby(['AgeBand'], as_index=False)\\\n",
    ".mean().sort_values(by='AgeBand', ascending=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "7b950057",
   "metadata": {},
   "outputs": [],
   "source": [
    "# divide the age number in dataset into five age groups\n",
    "for dataset in combine:    \n",
    "    dataset.loc[ dataset['Age'] <= 16, 'Age'] = 0\n",
    "    dataset.loc[(dataset['Age'] > 16) & (dataset['Age'] <= 32), 'Age'] = 1\n",
    "    dataset.loc[(dataset['Age'] > 32) & (dataset['Age'] <= 48), 'Age'] = 2\n",
    "    dataset.loc[(dataset['Age'] > 48) & (dataset['Age'] <= 64), 'Age'] = 3\n",
    "    dataset.loc[ dataset['Age'] > 64, 'Age']\n",
    "# we can remove age band after use\n",
    "data_train = data_train.drop(['AgeBand'], axis=1)\n",
    "combine = [data_train, data_test]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1014e695",
   "metadata": {},
   "source": [
    "##### 3.4.4 Completing empty ‘embarked’ values\n",
    "\n",
    "Noted that the embarked feature only have 2 missing values in train dataset, so we could simply use the most common occurrence to fill them out.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "16e5cd22",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'S'"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# fill the 'embarked' feature with most common occurence\n",
    "# find the most frequently occured embarked port\n",
    "freq_port = data_train.Embarked.dropna().mode()[0]\n",
    "freq_port"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "c964b1a4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Embarked</th>\n",
       "      <th>Survived</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>C</td>\n",
       "      <td>0.553571</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Q</td>\n",
       "      <td>0.389610</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>S</td>\n",
       "      <td>0.339009</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Embarked  Survived\n",
       "0        C  0.553571\n",
       "1        Q  0.389610\n",
       "2        S  0.339009"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# fill the 'embarked' feature with most common occurence\n",
    "# replace null value in train dataset\n",
    "for dataset in combine:\n",
    "    dataset['Embarked'] = dataset['Embarked'].fillna(freq_port)\n",
    "    \n",
    "# surviving rate across embarked feature after completing\n",
    "data_train[['Embarked', 'Survived']].groupby(['Embarked'], as_index=False)\\\n",
    ".mean().sort_values(by='Survived', ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c48e11b",
   "metadata": {},
   "source": [
    "And we can also convert this categorical feature into discretized numerical values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "4eb0ce9d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# convert embarked feature to numerical values\n",
    "for dataset in combine:\n",
    "    dataset['Embarked'] = dataset['Embarked'].map( {'S': 0, 'C': 1, 'Q': 2} ).astype(int)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5662233b",
   "metadata": {},
   "source": [
    "##### 3.4.5 Completing empty ‘fare’ values\n",
    "\n",
    "The fare feature only has one missing value, so we can also use median to replace the null value. Meanwhile, we could also divide the fare feature into several fare bands.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "8f4df5b1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>FareBand</th>\n",
       "      <th>Survived</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>(-0.001, 7.91]</td>\n",
       "      <td>0.197309</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>(7.91, 14.454]</td>\n",
       "      <td>0.303571</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>(14.454, 31.0]</td>\n",
       "      <td>0.454955</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>(31.0, 512.329]</td>\n",
       "      <td>0.581081</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          FareBand  Survived\n",
       "0   (-0.001, 7.91]  0.197309\n",
       "1   (7.91, 14.454]  0.303571\n",
       "2   (14.454, 31.0]  0.454955\n",
       "3  (31.0, 512.329]  0.581081"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# fill the single missing fare value in test set using median\n",
    "data_test['Fare'].fillna(data_test['Fare'].dropna().median(), inplace=True)\n",
    "# cut the fare feature into 4 fare bands\n",
    "data_train['FareBand'] = pd.qcut(data_train['Fare'], 4)\n",
    "data_train[['FareBand', 'Survived']].groupby(['FareBand'], as_index=False)\\\n",
    ".mean().sort_values(by='FareBand', ascending=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "e31af3f3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Survived  Pclass  Sex  Age  SibSp  Parch  Fare  Embarked\n",
       "0         0       3    0    1      1      0     0         0\n",
       "1         1       1    1    2      1      0     3         1\n",
       "2         1       3    1    1      0      0     1         0\n",
       "3         1       1    1    2      1      0     3         0\n",
       "4         0       3    0    2      0      0     1         0"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# operating the dataset\n",
    "# boundray of each band has been accquired above\n",
    "for dataset in combine:\n",
    "    dataset.loc[ dataset['Fare'] <= 7.91, 'Fare'] = 0\n",
    "    dataset.loc[(dataset['Fare'] > 7.91) & (dataset['Fare'] <= 14.454), 'Fare'] = 1\n",
    "    dataset.loc[(dataset['Fare'] > 14.454) & (dataset['Fare'] <= 31), 'Fare']   = 2\n",
    "    dataset.loc[ dataset['Fare'] > 31, 'Fare'] = 3\n",
    "    dataset['Fare'] = dataset['Fare'].astype(int)\n",
    "\n",
    "data_train = data_train.drop(['FareBand'], axis=1)\n",
    "combine = [data_train,data_test]\n",
    "\n",
    "data_train.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1fbebdda",
   "metadata": {},
   "source": [
    "##### 3.4.6 Creating new feature ‘Alone’\n",
    "\n",
    "For the ‘SibSp’ & ‘Parch’ value, we can combine them into a new feature called ‘Alone’ to see if the passenger was travelling alone."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "9d3da8ff",
   "metadata": {},
   "outputs": [],
   "source": [
    "# creating new feature 'alone'\n",
    "for dataset in combine:\n",
    "    dataset['famlies'] = dataset['SibSp'] + dataset['Parch'] + 1\n",
    "for dataset in combine:\n",
    "    dataset['IsAlone'] = 0\n",
    "    dataset.loc[dataset['famlies'] == 1, 'IsAlone'] = 1\n",
    "# drop the 'families' 'sibsp' & 'parch' features\n",
    "data_train = data_train.drop(['Parch', 'SibSp', 'famlies'], axis=1)\n",
    "data_test = data_test.drop(['Parch', 'SibSp', 'famlies'], axis=1)\n",
    "combine = [data_train, data_test]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c3bd6132",
   "metadata": {},
   "source": [
    "##### 3.4.7 Dataset appearance after wrangling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "3e8984fe",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>IsAlone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>886</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>887</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>888</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>889</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>890</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>891 rows × 7 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Survived  Pclass  Sex  Age  Fare  Embarked  IsAlone\n",
       "0           0       3    0    1     0         0        0\n",
       "1           1       1    1    2     3         1        0\n",
       "2           1       3    1    1     1         0        1\n",
       "3           1       1    1    2     3         0        0\n",
       "4           0       3    0    2     1         0        1\n",
       "..        ...     ...  ...  ...   ...       ...      ...\n",
       "886         0       2    0    1     1         0        1\n",
       "887         1       1    1    1     2         0        1\n",
       "888         0       3    1    1     2         0        0\n",
       "889         1       1    0    1     2         1        1\n",
       "890         0       3    0    1     0         2        1\n",
       "\n",
       "[891 rows x 7 columns]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "e907c168",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Embarked</th>\n",
       "      <th>IsAlone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>413</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>414</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>415</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>416</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>417</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>418 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Pclass  Sex  Age  Fare  Embarked  IsAlone\n",
       "0         3    0    2     0         2        1\n",
       "1         3    1    2     0         0        0\n",
       "2         2    0    3     1         2        1\n",
       "3         3    0    1     1         0        1\n",
       "4         3    1    1     1         0        0\n",
       "..      ...  ...  ...   ...       ...      ...\n",
       "413       3    0    1     1         0        1\n",
       "414       1    1    2     3         1        1\n",
       "415       3    0    2     0         0        1\n",
       "416       3    0    1     1         0        1\n",
       "417       3    0    1     2         1        0\n",
       "\n",
       "[418 rows x 6 columns]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_test"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9fb4fd3",
   "metadata": {},
   "source": [
    "## 4. Main Program and Test Results\n",
    "\n",
    "### 4.1 Main Program Using 5 Different Algorithms\n",
    "\n",
    "Now that our dataset is ready for training, we shall choose some appropriate modelling algorithms. Our problem is a classification and regression problem, so we need to identify the relationship between output (surviving rate) and those input features, and our learning process is definitely supervised learning as the surviving rate of passengers in train dataset has already been given. So we choose several modelling algorithms that listed below to achieve this goal:\n",
    "\n",
    "* Logistic Regression\n",
    "\n",
    "* KNN\n",
    "\n",
    "* SVM\n",
    "\n",
    "* Perceptron\n",
    "\n",
    "* Random Forest\n",
    "\n",
    "And we use sklearn to invoke these algorithms:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "5cd8e0ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "# machine learning algorithms\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.svm import SVC, LinearSVC\n",
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "from sklearn.linear_model import Perceptron\n",
    "from sklearn.ensemble import RandomForestClassifier"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc6bacc3",
   "metadata": {},
   "source": [
    "The training programs are listed below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "ed68fbde",
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the array we use in model training\n",
    "X_train=data_train.drop(\"Survived\", axis=1)\n",
    "Y_train=data_train[\"Survived\"]\n",
    "X_test=data_test.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "fa92042a",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Logistic Regression:  78.56\n"
     ]
    }
   ],
   "source": [
    "# logistic regression\n",
    "logregression=LogisticRegression()\n",
    "logregression.fit(X_train,Y_train)\n",
    "Y_pred = logregression.predict(X_test)\n",
    "acc_logreg = round(logregression.score(X_train, Y_train) * 100, 2)\n",
    "print(\"Logistic Regression: \",acc_logreg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "43694776",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "KNN:  83.5\n"
     ]
    }
   ],
   "source": [
    "# KNN\n",
    "knn = KNeighborsClassifier(n_neighbors = 3)\n",
    "knn.fit(X_train, Y_train)\n",
    "Y_pred = knn.predict(X_test)\n",
    "acc_knn = round(knn.score(X_train, Y_train) * 100, 2)\n",
    "print(\"KNN: \",acc_knn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "480b6be6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Support Vector Machine:  78.56\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\86187\\anaconda3\\lib\\site-packages\\sklearn\\svm\\_base.py:985: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.\n",
      "  warnings.warn(\"Liblinear failed to converge, increase \"\n"
     ]
    }
   ],
   "source": [
    "# SVM\n",
    "linear_svc = LinearSVC()\n",
    "linear_svc.fit(X_train, Y_train)\n",
    "Y_pred = linear_svc.predict(X_test)\n",
    "acc_linear_svc = round(linear_svc.score(X_train, Y_train) * 100, 2)\n",
    "print(\"Support Vector Machine: \",acc_linear_svc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "a97d9a1d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Perceptron:  74.3\n"
     ]
    }
   ],
   "source": [
    "# Perceptron\n",
    "perceptron = Perceptron()\n",
    "perceptron.fit(X_train, Y_train)\n",
    "Y_pred = perceptron.predict(X_test)\n",
    "acc_perceptron = round(perceptron.score(X_train, Y_train) * 100, 2)\n",
    "print(\"Perceptron: \",acc_perceptron)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "5d4b86c6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Random Forest:  85.75\n"
     ]
    }
   ],
   "source": [
    "# Random Forest\n",
    "random_forest = RandomForestClassifier(n_estimators=100)\n",
    "random_forest.fit(X_train, Y_train)\n",
    "Y_pred = random_forest.predict(X_test)\n",
    "random_forest.score(X_train, Y_train)\n",
    "acc_random_forest = round(random_forest.score(X_train, Y_train) * 100, 2)\n",
    "print(\"Random Forest: \",acc_random_forest)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "58d28f0c",
   "metadata": {},
   "source": [
    "**Obviously Random Forest algorithm has the highest accuracy, thus we can submit our result to the Kaggle website.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "8b414cbf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The last y_pred belongs to random forest\n",
    "submission_test=pd.read_csv(\"data/test.csv\")\n",
    "submission = pd.DataFrame({\n",
    "        \"PassengerId\": submission_test[\"PassengerId\"],\n",
    "        \"Survived\": Y_pred })\n",
    "submission.to_csv('submission.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9bc7386",
   "metadata": {},
   "source": [
    "![1.3](figure/1.3.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ac57f3d",
   "metadata": {},
   "source": [
    "### 5. Gain from the challenge\n",
    "\n",
    "The key to this program is not designing the algorithm but processing the data correctly, we need to find out the most relevant features and drop out other irrelevant features, and also we need to transfer categorical and continuous numerical values to discretized numerical values to better achieve the classification problem. \n",
    "\n",
    "This program lets we have a deeper recognition of the data analysis process, which is very useful for solving practice problems in future curriculum learning. And this program also teaches me how to decompose a complicated task and solve it layer by layer, which is also very helpful for my curriculum learning. And, for the last point, writing this report in English definitely improves my writing and thinking skills in English.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
